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1. 


INTRODUCTION 


The  Naval  Space  Surveillance  Center  (NAVSPASUR)  currently 
tracks  daily  over  6000  objects  in  elliptical  orbits  around  the 
Earth.  To  assist  in  identification  and  tracking  of  these 
objects  in  orbit,  NAVSPASUR  uses  an  analytic  satellite  motion 
model  implemented  in  the  Fortran  subroutine,  PPT2.  This 
sxibroutine  predicts  an  artificial  satellite's  position  and 
velocity  vectors  at  a  selected  time  to  aid  in  the  tracking 
endeavor.  Several  calls  to  the  subroutine  may  be  required  to 
aid  in  the  identification  of  one  object. 

With  the  current  increase  in  space  operations,  the  ntjmber 
of  objects  necessary  to  be  tracked  is  expected  to  increase 
substantially.  Subroutine  PPT2  provides  orbit  prediction 
within  an  adequate  response  time  for  the  current  number  of 
tracked  objects  in  space.  However,  a  substeintial  increase  in 
the  number  of  objects  will  cause  the  use  of  PPT2  on  a  serial 
computer  to  become  less  responsive  and  computationally 
burdensome.  Additionally,  if  there  exists  a  desire  to 
increase  the  accuracy  of  the  NAVSPASUR  model,  the  resulting 
subroutine  would  require  even  more  computing  resources  and 
make  achieving  results  even  more  time  consioming. 

Parallel  computing  offers  one  option  to  decrease  the 
computation  time  and  achieve  more  real-time  results.  Use  of 
parallel  computers  has  already  proven  to  be  beneficial  in 
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reducing  computation  time  in  many  other  applied  areas . 
Parallel  computing  offers  an  opportunity  to  both  increase  the 
efficiency  of  the  current  model  or  reduce  the  computational 
burden  of  a  more  accurate  future  model . 

The  ultimate  objective  of  this  thesis  is  to  quantitatively 
determine  the  parallel  computing  potential  of  the  current 
NAVSPASUR  auialytic  model  and  determine  the  svibsequent 
reduction  in  computer  time  if  the  model  is  applied  to  a 
hypercube  mulcicomputor .  The  following  chapter  provides  a 
description  of  the  NAVSPASUR  satellite  model  and  outlines  the 
algorithm  used  by  the  Fortran  svibroutine,  PPT2 .  Chapter  III 
provides  an  overview  of  parallel  processing  and  discusses  the 
methods  to  decompose  a  serial  algorithm  to  be  applied  to  the 
hypercube.  In  Chapter  IV,  the  two  methods  of  decomposing  the 
analytic  model  are  presented  with  their  respective  success  in 
reducing  computation  time.  The  last  chapter  of  this  thesis 
provides  conclusions  and  suggestions  for  future  research. 


ZI.  NAVSPASUR  SATELLITE  MOTION  MODEL 


A.  OVERVIEW 

The  satellite  motion  model,  adopted  by  NAVSPASUR  and 
implemented  in  siibroutine  PPT2,  is  a  general  perturbations 
variation  of  elements  model  of  artificial  satellite  motion 
around  the  Earth.  Given  a  set  of  a  satellite's  "mean"  orbital 
elements  at  a  given  epoch,  the  model  predicts  the  state 
(position  and  velocity)  vector  at  a  future  time.  The  model 
considers  perturbing  accelerations  caused  by  atmospheric  drag, 
oblateness  of  the  Earth,  and  asymmetry  of  the  Earth's  mass 
about  the  equatorial  plane.  This  model  ignores  perturbations 
due  to  longitudinal  variation  in  the  Forth's  gravitational 
potential  and  the  influence  of  other  celestial  bodies  such  as 
the  Moon  or  the  Sun . ^ 

Satellite  motion  models  can  be  classified  by  the  technique 
used  to  integrate  a  satellite's  equations  of  motion  and  the 
method  to  describe  the  variation  of  the  satellite's  orbit  in 
reaction  to  the  perturbing  forces.  The  two  primary  techniques 
to  solve  satellite's  equations  of  motion  are  general 
perturbations  and  special  perturbations .  General  Perturbation 


^Although  the  NAVSPASUR  model,  implemented  by  PPT2, 
neglects  the  longitudinal  variation  to  the  Earth's 
gravitational  potential,  a  correction  for  this  variation  can 
be  made  within  PPT2  by  a  call  to  a  second  subroutine,  LUNAR. 
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techniques  involve  an  analytic  integration  of  the  perturbing 
accelerations,  while  special  perturbation  techniques  involve 
the  direct  numerical  integration  of  the  equations  of  motion  to 
include  all  perturbing  accelerations.  Typically,  general 
perturbation  techniques  are  more  difficult  and  not  as  accurate 
as  special  perturbation  techniques.  However,  special 
perturbation  techniques,  in  general,  provide  this  increase  in 
accuracy  at  a  cost  of  several  orders  of  magnitude  of  computing 
resources  (Bate,  1971,  pp.  385  -  414)  .  A  third  technique,  now 
gaining  in  popularity,  is  a  semi-analytic  technique.  This 
technique  is  combination  of  both  the  general  and  special 
perturbation  techniques.  The  NAVSPASUR  model,  a  general 
perturbations  model,  solves  the  satellite's  equations  of 
motion  by  using  series  solutions  to  the  ordinary  differential 
equations . 

Within  these  techniques  exists  two  methods  to  describe  the 
variations  to  a  satellite's  orbit.  One  method,  variation  of 
elements,  describes  variations  to  the  orbit  in  terms  of 
changes  in  the  osculating  orbital  elements  with  respect  to 
time.  The  other  method,  variation  of  coordinates,  selects  a 
coordinate  system  and  describes  variations  to  position  and 
velocity  in  this  coordinate  system  with  respect  to  time.  The 
disadvantage  of  the  variation  of  coordinates  method  is  that 
the  solution  provides  no  immediate  insight  to  the  geometry  of 
the  orbit  (Danby,  1909,  p.  319).  Using  the  variation  of 
elements  method,  the  NAVSPASUR  model  describes  the  variations 
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to  an  orbit  in  terms  of  changes  to  the  classical  orbital 
elements  with  respect  to  time. 

B .  THEORY 

The  NAVSPASUR  model  is  based  on  a  theory  developed  in  1959 
by  Dirk  Brouwer  of  Yale  University  (Brouwer,  1959,  pp.  378  - 
397)  auid  modified  by  R.  L.  Lyddane  of  the  U.  S.  Naval  Weapons 
Laboratory  in  1963  (Lyddane,  1963,  pp.  555  -  558)  .  This 
theory  considers  an  Earth' s  gravitational  potential 
significauicly  more  involved  than  the  gravitational  potential 
used  in  the  classical  Kepler  model  of  idealized  satellite 
motion.  The  classical  model  assxjmes  a  perfectly  spherical 
Earth  and  the  gravitational  potential  may  be  expressed  by 

(2.1) 

r 

where  n  is  the  gravitational  parameter  and  r  is  the  radial 
distance  of  the  satellite  from  the  center  of  a  spherical  Earth 
(Bate,  1971,  pp.  11  -  16).  The  theory  used  in  the  NAVSPASUR 
model  assumes  only  that  Earth  is  symmetrical  eJsout  the  north- 
south  axis.  Expressing  the  potential  in  spherical  harmonics, 
the  Earth' s  gravitational  potential  is  modeled  by 

U=ii  +  ii^  P„‘^(sinP)  IC„  ^cosg?i+S„  ,jSing\]  (2.2) 

^  ^  <1-0 

where  R®  is  the  eqpjatorial  radius  of  the  Earth,  p  is  the 
satellite  latitude,  X  is  the  longitude,  and  S„^,  are 
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coefficients  depending  on  the  mass  distribution,  and  are 
the  associated  Legendre  polynomials.  These  polynomials  are 
defined  in  terms  of  the  Legendre  polynomials  P„: 

Po(x)=l 

Pi  (x)  =x 

(X)  --^P„-2  M  (2 . 3) 

dx^ 

Ignoring  any  longitudinal  variation  in  the  gravitational 
potential,  Equation  2.2  may  be  simplified  to 

U=il:-il:V  ^J‘„P„{sinP)  (2.4) 

^  ^  n*l  -T 

where  J„=-C„^q.  The  even  J„' s  account  for  the  oblateness  of  the 
Earth,  while  the  odd  Jo's  account  for  the  Earth's  asymmetry 
about  the  equatorial  axis.  (Solomon,  1991,  p.  3) 

1 .  Brouwer' s  Model 

Dirk  Brouwer  developed  his  theory  of  artificial 
satellite  motion  while  under  contract  by  the  Air  Force 
Ceunbridge  Research  Center  and  published  this  theory  in  the 
Astronomical  Journal  in  1959.  In  this  article,  Brouwer  used 
a  different  notation  for  the  classical  orbital  elements  from 
the  notation  commonly  recognized  today.  This  notation  is  also 
adopted  in  the  later  NAVSPASUR  model.  In  order  to  conform  to 
the  notation  of  Brouwer' s  original  article  and  NAVSPASUR 
model,  this  paper  will  also  use  Brouwer's  notation  listed  in 
Table  2.1. 
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Tabl*  2.1  Brouwer's  Notation 


Brouwer' a  Notation  Common  Notation 


a 

semi-major  axis 

a 

e 

eccentricity 

e 

I 

inclination 

i 

g 

argument  of  perigee 

CO 

h 

ascending  node 

n 

f 

true  anomaly 

V 

1 

mean  anomaly 

M 

Brouwer's  model  considers  the  zonal  harmonics  of  the 
Earth's  gravitational  potential,  accounting  for  the  Earth's 
oblateness  and  its  asymmetry  about  the  equatorial  axis  as 
expressed  in  Equation  2.4.  To  simplify  the  potential 
ecpaation,  his  model  uses  only  the  first  four  non-zero  terms  of 
the  series  described  in  Equation  2 . 3  with  Ji=0 : 

C7=ii+il^i  (l-Ssin^P)  -  ^■?  (Ssinp-Ssin^P) 
r  2r* 

(3-30sin^P-t-35sin*P)  (2.5) 

3r’ 

(15sinP-70sin^P  +  63sin®P) 

0r* 

where 
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Figur*  2 . 1  Classical  Orbital  Elements 
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A3  o  =  -J3i^ 

A5.0  =  -J5^ 


This  truncation  of  the  series  introduces  an  error  of  the  order 
where  kj®. 4841605- 10"^‘ 0 .5^5.  (Brouwer,  1963,  p.  393). 

In  order  to  derive  the  equations  of  motion  for  the 
satellite,  Brouwer  utilized  the  Heunilton-Jacobi  theory  of 
dynzunical  systems  to  express  the  Hamiltonian  F*U-^v^  in  terms 
of  canonically  conjugate  Delaunay  variaO^les .  Letting  a  and  e 
be  the  osculating  semi-major  axis  and  eccentricity, 
respectively,  the  Delaunay  variables  are: 


L-\l^ 

G=\/na(l-e*) 

cosJ 


Ismean  anomaly 
gsargtment  of  perigee 
bslongitude  of  ascending  node 


Using  the  Delaunay  variables  in  Equations  2.6,  the  equations 
of  motion  become 


dL  9f 

dl 

dF 

■at  u  ' 

■at 

dG  dF 

dg^ 

dF 

~3t  1g  ' 

lat 

~3g 

dh  dF 

dh_ 

dF 

~3t  IE  ' 

•at 

■3fl 

where  the  heuniltonian  is 
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F= 


±L 

2L^ 


2  2G2^1^ 


(l-4^)-^cos(2g+2f)  ] 


(2.8) 


In  order  to  express  the  hauniltonian,  F,  in  terms  of  only  the 
Delaunay  variables,  Brouwer  used  the  following  Fourier  series 
expansions : 


The 
e . 


coefficients 


_ cos 

r^ 


P„  Q, 


i'i 


(2.9) 


(2g+2f)  =  5^  0jcos(2g+jl) 
j— 

are  power  series  in  the  eccentricity, 


Using  two  canonical  transformations  through  the  choice 
of  suitable  determining  functions,  S  zmd  S' ,  Brouwer  was  ad>le 
to  solve  the  system  of  ordinary  differential  equations  listed 
in  Equations  2.7  in  terms  of  the  mean  elements,  a",  e",  I", 
1",  g",  and  h"  .  The  transformed  hamiltonian,  F**,  depends  only 
on  the  transformed  variables  L",  G",  and  H" .  Replacing  F  by 
F**  in  Equation  Set  2.7,  Brouwer  found  that  L",  G",  and  H"  are 
constants  with  respect  to  time  and  1",  g",  and  h"  are  linear 
functions  of  time.  Consequently,  from  Equation  Set  2.6,  it 
follows  that  a",  e",  euid  I"  are  constant  with  respect  to  time. 
Additionally,  as  a  consequence  of  the  transformations  (see 
Brouwer,  1959,  pp.  379  -  393),  Brouwer  was  e±)le  to  separate 
the  changes  in  the  orbital  elements  with  time  into  secular, 
long  period,  and  short  period  variations .  Including  only 
secular  terms  up  to  order  0(kj2)  and  periodic  terms  to  order 


10 


©(kj),  Brouwer  found  that  the  secular  variations  are  a 
function  of  only  a",  e",  I”,  and  t;  the  long  period  variations 
are  a  function  of  a”,  e”,  I",  auid  g";  and  the  short  period 

variations  are  a  function  of  all  six  mean  elements . 
Additionally,  a,  e,  and  I  do  not  experience  any  secular 
variations  to  order  and  a  does  not  experience  any  long 

period  variation  to  order  O (kj)  .  Using  the  following 
constants, 

a." saemx-major  axis  constant 
e"  =€ccentricity  constant 
l"sinclination  constant 

sEarth' s  equatorial  radius 

and  notations. 


il=\/l-e^ 

0=cosl 

// 

^2  ^3  0 

777? 

*  Y's'Y,n  ‘ 

Y'.=Y.T|* 

Y'5  =  Y5TT'° 

Brouwer's  formulas  for  computing  the  perturbations  are:^ 


^In  order  to  avoid  confusion  over  notation,  let  5,,  5^, 
and  82  represent  the  secular,  long  period,  and  short  period 
variations,  respectively. 
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Secular  terms 


5 


4=-|r  2^1  (-1^30^) 

■"-^Y'2^[-15+16ti+25ti2+<30-96t]-90ti2)  0*+ (105+14T]+25ti2)  e<] 


+  ^Y'4'ne"*  (3-3O02+350*) 


(2.10) 


S.g=-|r2(-i+50^) 

■^-^Y'2^[ -35+2411+25112+  (90-19211-126112)  02+  (385+360ll+45ll2)  0<] 
^^Y'4  [21-9112+ (-270  +  126112)  02+ (385-189112)  e<] 

(2.11) 


6,h=-3Y'j0  +  -|Y'j"[  (-5+1211  +  9112)0+ (-35-3611-5112)  02] 
+  .|y'4  (5-3112)  (3-702)0 

Long  period  terma 


6ie={^  Y' 2®' 'Tl^  [  1 -1 102 J 

^  (1-502)  ^ 

ii  oo32g" 

^Yz 

{4*3e"^)  [1-902- ...  ]}  sing" 

^  Yz  (1-502)  ^ 


(2.13) 


6,I=- 


e''6,e 

ll^tanX" 


(2.14) 
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+  +  [l-9e^-J^]]co3g" 

jQ"  4  64  1-50^ 

+ _2£]Ll  e ' '  T|  ^  s  i  n  I "  { 1  -  5  0  *  - )c  o  s  3  g ' ' 

38477  1-50^ 


(2.15) 


[  (2-e"^)  -11  (2.3e-^)  0..iO(2;^5^^ 
16  1-50' 

__400e^^j  (2+e''^)  -3  (2+3e"*)  0^ 

(l-5d^)^  3471 

_  8(2^5e"*)0^_  8Oe"^0« 

- -  (l-50i)i^  ^ 

^  1  U/  /  ainJ"  _  e''02  ^ 

16  sinJ" 


-  TIKI^'  ' 

+e"sinJ"  (26+9e"^)  ]  (1-90^- 


240‘ 

1-502 


(2.16) 


.i^e"e>ainl"  (4.3e''>)  [3..J^._l^|^])cos9" 

*_?il!l(-i(e"3inl"  (3.2e"‘)  -  '  [l-59‘- 

576y^j  ^  sinJ'^  1-502 

+e"^023inT"  [5+-ll^  + _ ]  )cos3g" 

1-502  (1-502)2 


5,h=e-0(-^  tll._!£!^._L0^] 

"  ^  1-502  (l-sFp 


5y'4  1602  ^  400* 

1377  1-502  (1-502; 
e"0|_)rS  ^_5Y%  ^ 

"477  sinl'^  16sinJ" 

ilijLisinJ"  (4+3e"*)  (3 


J  )sin2g" 


(4+3e"2)  [1-902- 


240*  . 

“sF 


1602  ^  400* 

1-502  (1-502) 


J  Icosg' 


Jl!l±e''^9\ _ L_  [1-502-  160*  j 

37677  3sInJ^  1-502 

-2_w/rc.  320^  .  800*  ,, _ 


^sinJ"  [5  + 


1-502  (1-502) 


_]  )cos3g" 


(2.17) 


Short  period  terms 


_//3  1 

6,a=a''Y'2[  (-1+302)  i  ) 

r'2 

+3  (1-02)  „  cos  (2g'+2f')  ] 
r' 


826  =  ^47!  <"1^302)  (. 
^3(1-02)  (  *  -J. 

r'2  iY 

»2/v/ 


.//3 


.COS  (2g'  *2f" )  ] 


-HZi  (1-02)  (3a"cos  {2g*  *f')  *e"co3  (2g'+3f')  1 
2e" 


5jl=-^03inl"  [3cos  {2g'^2f') 

*3e"co3  (2g'*f')  *e"co3  {2g'  *2f')  ] 


8j1  =  -4-U{2  (-1^302)  (  *  ti2*  «  ♦Dsinf' 
4e"  r'  r' 

~m  _// 

•^3(1-02)  (l-iL—'n*^  a  )gij^^2g' •*•£') 
r'2  r' 

j,//2  1 

+  (-S-;r^'+4)sin(2g'--3f')  ]) 
r'  J 


„  ■n2Y',,  a" 

82g=41i{2  (-1+302)  [  1^2^  3+1)  sinf/ 

4e  r'  iT 

a"^  a" 

+3(1-02)  [  (l-.i.  ti2+ 3  )3in(2g^+f^) 

r'2  r' 

J*"2  J.II  1 

+  (  *  t^2^  a  j  sin(2g'+3£')  ]} 

r'*  r'  3 

y! 

+-Lji{6  (-1+502)  (f'-i' +e"sin£0 

+  (3-502)  (3sin(2g'+2f') 

+3e"sin  (2g'  +£')  +e"sin  (2g'+3£') ) 


(2.18) 


(2.19) 


(2.20) 


(2.21) 


(2.22) 
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(2.23) 


62h*-^e{6  (f'  -1'  +e"sinf' )  -  (3sin  (2gr'  +2f' ) 

+3e"sin  (2g'  +f' )  +e"sin  (2g'  +3f')) 

Using  Equations  2.10  through  2.23,  Brouwer's  algorithm 

for  predicting  a  satellite's  state  vector  is  as  follows: 

•  Begin  with  mean  elements  a",  e",  I",  Iq",  go"f  ho". 


•  Form  mean  motion  ng. 


(2.24) 


•  Propagate  "mean"  mean  anomaly  1",  mean  argument  of  perigee 
g",  and  mean  ascending  node  h"  using  Equations  2.10  - 
2.12. 

^«=gr/+notfi,g  (2.25) 


•  Apply  long  periodic  corrections  to  1",  g",  and  h"  using 
Equations  2.15  -  2.17  and  compute  the  long  period 

variations  5ie  and  using  Equations  2.13  and  2.14. 

i'=2"+6ii 

g'=g"^b^g  (2.26) 


•  Solve  Kepler's  Equation  for  the  eccentric  anomaly  E' , 
using  1'  and  e"  and  compute  the  true  anomaly,  f'  and 
radius,  r'  . 


E'-e"sxnE'-l" 
f'  d^e") 


tan— = 
2 


il-e")  2 


(2.27) 


l+e^ccsf ' 


•  Apply  short  period  variations  to  1',  g' ,  h' ,  a",  e",  and 
I"  using  Equations  2.18  -  2.23. 
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h=h'+b^h 

a-a"+b^a 

e=e"^h^e+h^e 


(2.28) 


•  Solve  Kepler's  Equation  for  E,  using  e  and  1  and  compute 
f  and  r . 


tan— = 


E~es±nE=l 
f 


2  N 


ad-e 


4114  tan^ 

(1-e)  2 


l+ecosf 


(2.29) 


•  Compute  the  position  vector  in  the  conventional  manner. 

x=r  [cos  (g+f)  cosA-sin(g+f)  sinAcosJ] 

y-r  [cos  (g+f)  sini3+sin(g+f)  coshcosi]  (2 . 30) 

z=rsin  (g+f)  sini 

In  addition  to  accounting  for  perturbations  only  due 
the  Earth' s  oblateness  and  as3^mmetry  about  the  equatorial 
axis^  this  model  has  several  shortcomings  which  Brouwer 
addressed  in  his  article.  The  first  is  a  singularity  at 
critical  inclination  (I^='Cos"’' (l/Vs)  =63. 4°)  for  the  long  period 
corrections  since  many  of  the  terms  have  a  divisor  of 
(5cos^I-l)  .  Second  is  a  singularity  for  very  small 

eccentricities  (near  circular  orbits) .  This  singularity  is 
due  to  the  appearance  of  e"  as  a  divisor  in  the  short  period 
terms.  Finally,  there  exists  a  singularity  in  some  of  the 
elements  for  very  small  inclinations  (orbits  lying  in  the 
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ec[uatorial  plane)  .  Brouwer  suggested  that  although 
singularities  in  some  of  the  elements  existed  for  very  small 
eccentricities  and  inclinations,  no  such  singularity  existed 
in  the  coordinates.  Hence,  the  formulas  could  be  modified  and 
expressions  obtained  for  the  perturbations  in  coordinates  for 
these  special  cases . (Brouwer, 1959,  p.  393) 

2  .  Lydaxme' s  Modifications 

Lydanne's  modifications  correct  for  the  singularities 
in  Brouwer' s  model  due  to  the  very  small  eccentricities  and 
inclinations.  He  presented  the  suggested  modifications  in  an 
article  in  the  Astronomical  Journal  in  1963. 

As  Brouwer  suggested  in  his  article,  Lydanne  and 
several  other  investigators  believed  there  existed  well- 
determined  expressions  for  the  coordinates  of  a  satellite  in 
the  case  of  either  very  small  eccentricity  or  inclination, 
because  no  singularity  actually  existed  in  the  coordinates  for 
the  small  eccentricity  or  inclination.  However,  Lydanne 
encountered  difficulty  in  applying  the  approach  suggested  by 
Brouwer.  This  approach  requires  the  Taylor  series  expansion 
of  the  coordinates  in  the  element  perturbation.  Although  the 
first-order  terms  were  regular,  the  higher  order  terms  were 
singular.  (Lyddane,  1963,  p.  555) 

Lyddane  suggested  another  approach.  By  formulating 
the  perturbation  theory  in  terms  of  Poincare'  variables 
instead  of  Delaunay  variables,  the  singularities  can  be 
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avoided.  The  Poincare'  variables  are  defined  in  the  terms  of 
the  Delavinay  variables  as  follows : 

x^=L  y^=l+g^h 

X2=y2  (L-G)  COS  (g+h)  y^=-yj2  {L-G)  si.n{g*h)  (2.31) 

(G-H)  cosh  y^=-yJ2  [G-H)  sinA 


Using  a  method  similar  to  Brouwer' s  method  of 
solution,  Lyddane  presented  with  some  algebraic  manipulation 
the  following  formulas : 


a-a"*ba 

l*g*h=l"+g"+h"+b  {.1-^g+h) 


(2.32) 


ecosl  -  (e"  ■*-6e)  cosl"  -e"  hlsinl"  (2.33) 

esxnl-  {e"  +5e)  sinl"  +e"52cosl" 


T  t//  Tff  §T 

sin(^)  cosh=  [sin  (-^)  +co3  (-y)  -j.]cosh" 


-// 


-sin  ( Z— )  bhsinh 


n 


T  ^  x^  ^  5x 

sin  (.j)  sinh=  [sin  (— ^)  '^cos  (— j-)  -j-J  sinh" 


(2.34) 


X'  ^ 

+sin  (— ^)  5hcosh' 


n 


where 

5  =5,  -82 

Additionally,  Lydcinne  discovered  that  the  use  of  1"  instead  of 
1'  in  Equations  2.21  and  2.22  introduces  an  error  of  at  most 
order  0  (k^^)  .  Since  Brouwer  model  computes  long  period  terms 
to  only  to  order  ©(kj),  Lyddame  suggested  that  the  short 
period  corrections  may  be  computed  using  1"  instead  of  1' 
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(Lydanne,  1963,  p.  557) .  Lyddane  claimed  this  approach  would 
overcome  the  e"=0  and  I"=0  singularities  in  the  periodic 
variations.  His  approach  removed  the  singularity  e"=0  for  all 
terms  and  the  singularity  at  I"=0  for  all  terms  except 
(Equation  2.14)  (Solomon,  1991,  p.  8). 

Lyddane' s  algorithm  for  propagating  an  element  set  is 
as  follows: 


•  Compute  1",  h",  5e,  e"5l,  and  5l  using  Brouwer's  formulas 
(Equations  2.10  -  2.23).  To  avoid  a  small  difference 
between  two  large  quantities,  use  the  following  identities 
in  Equation  2.19  to  compute  5e . 


(A,) 


1*1 


■^2cosf"*2e"cos^f"+e"^cos^f") 


{e"*2cosf" 

Tl* 

+3e"cos^f  ") 


(2.35) 


•  Compute  a  and  1+g+h  using  Equation  Set  2.32.  To  reduce 
amount  of  error  introduced  by  finite  precision,  combine 
Equations  2.15  -  2.17  for  (1+g+h)  and  Equations  2.21  - 
2.23  for  82  (1+g+h). 


•  Compute  s  in  ( V6I " )  5h . 


7"// 

sin  ( — )  6h= 
2 


si.nl"bh 

tH 

2cos  (  ^  ) 


(2.36) 


•  Solve  for  e,  I,  1,  and  h  using  Equation  Sets  2.33  and 
2.34. 

•  Compute  coordinates  and  velocity  components  in  the  usual 
manner . 
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coshcos (g+f) -sinhsin{g+f) cosi 
sinhcos (g+f)  +coshsin ig+f) cos J 
sin  (g+f)  sinT 


-coshsln {g+f) -sinhcos {g+f) cosI 
-sinhsin {g+f)  +coshcos {g+f)  cosi 
cos {g+f) sini 


f-rf 


^  [  (esinf)  f+  (1+ecosf)  <►] 

Ti^ 


(2.37) 


3.  NAVSPASUR  Modifications 
a .  Atmospheric  Drag 

Brouwer's  model  considers  only  the  perturbations 
due  to  zonal  harmonics  (Earth' s  oblateness  and  asymmetry  about 
the  equatorial  axis) •  For  near-earth  orbits,  the  magnitude  of 
the  perturbative  acceleration  due  to  atmospheric  drag  can  be 
of  the  same  order  as  the  magnitude  of  the  perturbative 
acceleration  due  of  the  earth's  oblateness  (Knowles,  1992,  p. 
226).  Figure  2.2  shows  the  relative  orders  of  magnitude  of 
the  perturbative  accelerations  at  various  altitudes  above  the 
Earth.  Fluctuations  in  the  density  emd  relative  velocity  of 
the  atmosphere  to  the  satellite  make  the  perturbative 
acceleration  due  to  atmospheric  drag  difficult  to  model .  To 
compensate  for  the  drag  perturbation,  the  NAVSPASUR  model 
utilizes  a  simplified  sub-model  for  drag  (Solomon,  1991,  pp. 
9  -  10) .  Atmospheric  drag  is  modeled  by  time  derivatives  of 
the  "mean"  mean  anomaly,  1",  using  two  parameters  and  M3: 
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non:  typical  satallita 
ctia/actaristics 


Figure  2.2  Perturbative  Accelerations  on  Satellites 
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Gaosynchronou* 


(2.38) 


2  "= i  "q  +m  C  +Af2 

where  t  is  the  elapsed  time  from  epoch.  Using  the  following 
relationships , 

m=no  (1+6^-^)  »no 

and 


it  can  be  assumed 


a 


(2.39) 


Differentiating  Equation  2.39  and  working  to  first  order. 


X.  2  m 


(2.40) 


Using  ih»2M2  from  Equation  2.38  and  substituting  into  Equation 
2.40  yields  following  model  for  the  drag  effect  on  the  semi¬ 
major  axis: 


(2.41) 

3/n  ^ 

Asstaming  a  spherically  symmetric  atmosphere  with  an 
exponential  density  variation 


p  ( r )  =Poexp  ( -  )  (2.42) 

H 

where  Po  is  the  density  at  radius  r=ro  and  H  is  the  scale 
height,  the  rates  of  change  in  the  semi-axis,  a,  and 
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eccentricity,  e,  with  respect  to  the  eccentric  anomaly,  E,  may 
be  expressed  as  (Danby,  1988,  pp.  330  -  331) : 


^^-a^ftpoexpC^^cosE)  J  ( i  ^ecosE) 

dE  °  H  N  l-ecos£: 

-^=-aTi6p(,exp  (i£®cosE) 


(2.43) 


l-t-ecosE* 
''J  1-ecosF 


cosE 


dE  ff 

where  6  sund  A  are  constants .  Estimating  the  average  change  in 
a  aind  e  by  integrating  Equation  Set  2.43  over  one  orbit  to 
lowest  order  of  e  yields : 


=  (2.44) 

Aa  a 

Assuming  CT=1  and  substituting  Equation  2.41  into  2.44,  the 
model  for  the  decay  in  eccentricity  is : 

(2. 45) 

a"  3m  ^ 

b.  Kmmr  Critics!  Inclinstion 

Neither  Brouwer  nor  Lyddane  were  aOOle  to  correct 
for  the  singularity  in  the  long  period  terms  at  the  critical 
inclination  (Io=cos"^  (l/Vs)  ==63 . 4®)  .  To  prevent  an  overflow 
error  in  the  subroutine  PPT2,  the  factor  (l-5cos*I")  is 
approximated  by  the  function 

l-exp(-lOOx^)  (2.46) 

X 

where  x=l-5cos^I".  However,  with  a  small  value  of  x,  T2 
cannot  be  computed  directly.  By  using  a  product  expansion  of 
Equation  2.46, 
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(2.47) 


1  10 

r2=— (1-exp  (-px^) )  (l+exp(-2'’Px2)  ) 

X  „.<3 

where  P«100/2“.  To  remove  the  factor  of  x  from  the 
denominator,  the  first  factor  is  approximated  by 

(l-exp(-px^) )  (.pn  PV"  (2.48) 

X  ^  in+1)  ! 

Figure  2.3  shows  a  comparison  of  T2  using  Equations  2.47  and 
2.48  and  1/x  in  vicinity  of  the  critical  inclination. 
(Solomon,  1991,  pp.  10  -  11) 

C .  PPT2 

PPT2  is  the  Fortraui  sxibroutine  which  implements  Brouwer' s 
model  with  Lyddane's  and  NAVSPASUR' s  modifications.  PPT2 
completes  several  satellite  propagation  tasks. 

•  Predicts  a  satellite  state  vector  at  future  time. 

•  Computes  partial  derivatives  of  the  position  vector  with 
respect  to  the  orbital  elements  (used  in  Method  of 
Differential  Correction  to  modify  set  of  stored  elements 
in  light  of  current  observations) . 

•  Predicts  the  time  and  state  vector  of  satellite  for  a 
given  true  emomaly. 

PPT2  is  composed  of  sections  which  accomplish  the  tasks 
named  ad^ove .  The  sections  are  delineated  by  conditional 
breakpoints.  Control  over  which  section  to  execute  is  handled 
by  a  set  of  control  variables.  Data  is  passed  to  the 
sxabroutine  through  three  control  variables  and  four  Common 
blocks.  A  complete  description  of  the  input/output  of  PPT2  is 
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Figur*  2 . 3  Near  Critical  Inclination 


contained  in  Appendix  A. 

NAVSPASUR  represents  each  tracked  object  by  a  set  of 
stored  elements.  The  stored  elements  are  the  meain  .rbital 
elements  plus  two  drag  parameters,  Mj  auid  M,.  The  stored 
orbital  elements  are  the  same  as  the  classical  orbital 
elements  used  in  Brouwer's  model  with  two  exceptions.  The 
mean  semi -major  axis,  a",  is  replaced  by  the  variable,  m,  the 
mean  "mean"  motion. 
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m=  (1+5,1)  no=  (1+6,1) 


(2.49) 


The  other  exception 

is 

that 

the 

cosine 

of 

the 

mean 

inclination,  cos  I", 

is  stored 

instead 

of 

the 

mean 

inclination,  I".  Thus 

the 

stored 

elements  are: 

m,  Mj,  M3, 

e",  g",  h",  and  cos  I". 

For  NAVSPASUR  to  use  the  formulas  from  Brouwer's  model, 
the  mean  semi -major  axis,  a",  must  be  determined  from  the  mean 
"mean"  motion,  m.  Since  a"  is  implied  in  the  8,1  (Equation 
2.10),  no  direct  solution  is  possible.  Therefore,  PPT2 
recovers  a"  using  an  initial  approximation  for  a"  auid  solving 
Equation  2.48  for  a"  iteratively.  The  initial  guess  for  a"  is 
ao"=m“*.  Then,  for  i=l,...,5 
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corrections  for  the  mean  argxament  of  perigee,  g",  and  mean 
ascending  node,  h"  .  The  secular  corrections  for  g"  and  h"  are 
computed  in  terms  of  mean  anomaly,  1",  instead  of  time  using 
6,g  amd  5,h  from  Equations  2.11  amd  2.12. 

dg"  dg" 

dg"  ^  S.g  (2.51) 

dl  dl"  m  m 
dt 


dh" 

dl" 


dh"  dh" 

~dr  _-3r  ^n,h,h^  h,h 
dl"  nt  m  ~maP^ 
~dt~ 


(2.52) 


Once  the  secular  corrections  to  the  "mean"  mean  anomaly  are 
computed  via  Equation  2.38,  Equations  2.51  and  2.52  are  used 
to  correct  g"  and  h" . 


where 


(2.53) 


Al  =mt 

With  all  of  its  conditional  breakpoints,  PPT2  completes 
only  those  tasks  required  by  the  user.  Prior  to  completing 
any  of  the  fore-mentioned  tasks,  the  user  is  required  to  make 
an  initial  call  to  sxibroutine  PPT2  to  recover  a"  and  compute 
the  secular  corrections.  During  this  initial  call,  many 
variad5les  are  set  which  will  be  used  in  subsequent  calls. 
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increasing  efficiency.  Therefore,  at  least  two  calls  to  PPT2 
are  required  to  complete  any  of  its  tasks. ^ 

Because  the  main  objective  of  this  thesis  is  the 
parallelization  of  the  satellite  state  vector  prediction  task 
of  PPT2,  only  this  algorithm  will  be  presented.  For  a 
complete  description  of  the  other  tasks  see  (Solomon,  1991) .* 
The  algorithm  implementing  the  NAVSPASUR  model  is  as  follows : 

•  Begin  with  the  stored  mean  elements  plus  the  drag 
correction  terms  do",  m,  Mj,  Mj,  eo",  go”  r  ho",  and  cos  I") 

•  Compute  T2  using  Equations  2.47  and  2.48. 

•  Recover  mean  semi -major  axis,  ao",  from  the  mean  motion, 
m,  by  iteration  using  Equation  Set  2.50. 

•  Compute  the  following  dimensionless  quantities: 


Y'3  =  Y3^'‘  Y'4  =  Y«^‘*  y5  =  Y5TT'° 


•  Compute  drag  corrections  for  the  semi -major  axis,  a",  and 
eccentricity,  e",  using  Equations  2.41  and  2.45. 

•  Using  Equation  2.38  and  Equation  Set  2.53,  propagate  the 
mean  anomaly,  1",  argument  of  perigee,  g",  and  the 
ascending  node,  h" ,  considering  only  the  secular 
corrections,  (h"  may  be  optionally  corrected  for  the 
Earth's  rotation  using  Equation  2.56,  where  0)  is  the 
Earth' s  angular  velocity  and  T^  is  the  time  at  which  the 
direction  of  the  Greenwich  meridian  and  equinox  direction 
coincides . 


^For  a  complete  description  of  calling  options  of  PPT2, 
see  (Solomon,  1991,  pp .  11  -  24). 

*A  complete  listing  of  subroutine  PPT2  is  contained  in 
(Solomon,  1991,  pp.  39  -  55) . 
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(2.5S) 


h''=V'*-gAI 
h"=h"-(0{t-T^)  [optional]  (2.56) 


•  Propagate  the  eccentricity,  e" ,  and  the  semi-major  axis, 
a". 


e"=min  [max  [0,  e^'+^t]  ,  .  99999] 
a" =max [ 1 , aj' +^t] 


(2.57) 


•  Solve  Kepler's  Equation  using  Steffensen's  Method  and  use 
this  result  to  compute  cosine  and  sine  of  the  true  anomaly 
and  radius . 


E" =1" ^e" sinE" 


cosf"= 

sinf"= 


cosE" ~e" 
l-e" cosE" 
v/l-e""  sinK" 
l-e"cosf:" 
a"vj‘ 

1+e" +cosf" 


(2.58) 


•  Using  Equations  2.13-2.17,  compute  long  period  correction 
terms  for  e,  1,  h,  and  I  in  the  following  forms  replacing 
g'  and  (1-50*)"'^  with  g"  and  T2,  respectively: 


Z^e-VLElcoslg"  ^VLEZsing"  +VL£3sin3g" 
e"5ii=Ti  {VLElsinZg" -VLLZcosg" -VLE3co33g" ) 
ainl"b^h=VLHll3in2g"  *VLH2Ico3g"  *VLa3Ico33g'^  (2.59) 
e"bj^e 


where 
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1  5y' 

VLEl=e"T\U-ty' T2]  - - Li- [1-302-80*-  T2]  ) 

8  2  12y  2 

VLE2=  {y'+Elj.  (4+3e"^)  [1-902-240*-  T2] ) 

4y^2 

VLE3=--IElj-e"^^3inl"  [1-502-160*-  T2] 

384y\  ' 

15y' 

VLL2=VLE2+ - Lie''^'n2sinl"  [1-902-240*-  T2] 

WI 


y' 

VLHlJ=e''^0sinJ'' (--a^  [11+8002-  r2+2OO0*-  T2^] 


5y\ 


T 


+ !_L  [3  +  1602-  1-2  +4  0  0*-  r22] ) 

12  2 

VLH2J=£^{y',  +  -5X4  (4+3e''*)  [1-902-240*-  T2] 


Ax 


15v^  sin^X^^ 

+ _ Li-g _ (4+3e"^)  [3+1602-  72+400*-  T2^]) 

VLH3J=-_iiLi.e"^0{i  [1-502-160*-  T2] 

576y'2  ^ 

+sin2j''  [5+3202-  r2+8O0*-  T2^]) 


*  Representing  the  quantity  (1+g+h)  by  z,  compute 
5iZ=6il+6ig+Oih  by  combining  individual  parts  prior  to 
computing. 

5jZ=VLSlsin2g"  +VLS2co3g"  *VLS3cos3g"  (2.60) 


where 


VL5J  =  -1!I^-1  [Y'2(1-11Q"-4O0*-  T2)  (1-302-80*-  T2)  ] 


-"20 


+  .f_g_Z[Y'2  (11+8002-  2-2+2000*-  722)  -y's  (3  +  1602-  72+400*-  72)  ] 


Y'3 


.//2 


+25e""0*-  72*(y'2--^)  [y'z  d -3  3  02-2  0  0  0*-  72) 

-YS  (1-902-400*-  T2)  ] 
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VLS2=e"3inl"{-^  [  +  [f  3  +  44^  (4+3e''') 

4y'j  1+11  16 

+  [3  (€"^-713+11]  [1-902-240^-  T2] 

+  iliLi0  (1-0)  [  (4+3e"^)  (3-*-160^-  72-^400*-  T2^)  l) 
32y'2 


(l-902-240«-  T2)  ] 


VL53=e''sinl"{_iili_[3Tl^-3-(2  +  _®  )  e"^]  [1-502-160"-  T2] 
1152^2 

35y' 

- _ LL  [e''^0  (1-0)  (5+4002-  r2-*-8O0"-  T2^)  ]) 

576y'2 


•  Using  Equation  2.19  and  Equation  Set  2.35,  compute  the 
short  period  correction  term  for  e,  replacing  g'  and  f' 
with  g”  and  f",  respectively.  Then  combine  long  and  short 
period  corrections  to  form  5e. 

52e=4-i{  ( -1+302)  [e'^^cos^f"  *3&" cos^f" 

+3cosf"+e'MTl+-^)  ]  (2.61) 

+3  (1-02)  [e"^co3^f"  ^3e"co3^f" 

^3co3f" *e"co3 (2g"*2f") ] 

-ti2(1-02)  [3e"cos  (2g"+f")  +e"cos  (2g" +3f " ) } 

56=616+626  (2.62) 


•  Using  Equations  2.20  and  2.21,  compute  the  short  period 
correction  t  crms  for  I  and  1  in  the  following  form 
replacing  g'  and  f'  with  g"  and  f",  respectively.  Then 
combine  respective  long  and  short  period  terms  to  form  6l 
and  e6l. 
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62r=-^0sinj"  [3cos  (2g"  +2f")  +3e"cos  {2g"  +f") 

^e"oo3{2g''  ^3f")  ] 

'52l  =  -^!l!jli{2(-l+3e^) 

X  [  cosf")  {2+e"cosf")  ^  ^  3±nf>f 

■n 

+3  (1-02) 

v[p-  {l-B"cosf")  (2^e"cosf") 

"H 

(l+e"cosf")  {2+e'^ cosf")  ^1. 

'  Tj  'J' 

X  sin(2g''+3f")  ]) 


(2.63) 


81=5^1+821 

e8l=e82l+e82l 


(2.64) 


•  Using  equation  2.23,  compute  short  period  correction  for 
h  in  the  following  form. 


sinl"5jh=-^sinl"0[6(f''-I"e"sinf") 

-3sin(2g"+2f") +3e"sin (2g" +f" ) 
+e"sin(2g"+3f") ) 


(2.65) 


•  Using  Equation  2.36  and  relationship  5h=5ih+62h,  compute 
sin(^6I")  5h. 


.  ,  5.  sini"  {82h+82h) 

sin  (__)  8h= - - - 

^  2cos(.J) 


(2.66) 


•  Combine  terms  of  52Z=52l+52g+52h,  replacing  g' ,  1'  ,  and  f' 
with  g",  1",  f",  respectively.  Then  compute  z. 


(Tl  +  rr— -1) 

52Z=-e''252J _ _ 

-2d  [6  (1-20-502)  {f"-e"sinf"-l'') 

4 

-(3+20-502)  (3sin(2g"+2f") 

+  3e''3in  {2g"  +2f" )  +e"sin  {2g"  +3f")  ] 


(2.67) 
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Z-=1"  +g"  +h" 


(2.68) 


•  Compute  a  using  Equation  2.18. 

a=a"+5ja  (2.69) 

•  Solve  Equation  Sets  2.33  and  2.34  explicitly  for  e,  1,  I, 
and  h. 


e=V (6e) (e5l) ^ 

-If  6esinl'' +e5lcosI"  . 
'■■^ecosi^^-eSlsinl^^^ 
cosI=l-2[  (5J)2+(sin(.^)  5h)2] 

6Jsinh"+sin  (.£)  6hcosh" 

h=t  em‘^  ( _ Z _ ) 

SJcosh" -sin  (.^)  5hsinh" 


•  Compute  g. 


g-z-l-h 


(2.71) 


•  Solve  Kepler's  Equation  again  using  Steffensen's  Method 
and  compute  cos  f,  sin  f,  and  r. 


E=I-*-esinE 


cosf= 


sinf= 


r= 


cosE-e 

l-ecosE 

vl-e*  sinE 
1 -ecosE 

1+e+cosf 


(2.72) 


•  Compute  the  satellite  state  vector  using  Equation  set 
2.37. 
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III.  PARALLEL  COMPUTING 


A.  OVERVIEW 

1 .  inition 

The  complexity  of  scientific  computing  today  demands 
faster  computers.  Greater  detailed  models  require  a 
substantial  amount  of  computation.  Faster  computers  are 
needed  to  provide  the  results  of  the  computation  in  a  timely 
manner.  In  response  to  this  demand,  computer  engineers  have 
taken  two  approaches  to  achieve  faster  performance . 

The  first  approach  is  to  increase  the  speed  of  the 
circuitry.  Although  great  advances  have  been  made  in 
increasing  the  speed  of  computer  circuitry,  this  increase  in 
speed  is  bounded  by  the  speed  of  light.  Additionally,  the 
specific  design  and  manufacture  requirements  for  further 
increases  in  speed  are  quite  costly. 

The  second  approach,  parallel  computing,  provides  an 
alternate  means  to  achieve  faster  computer  performance  using 
affordable  circuitry  design.  Many  articles  and  books  have 
been  written  describing  the  methods  to  exploit  this  approach. 
The  terms  parallel  computing  ajid  parallel  processing  seem  to 
be  used  interchangead^ly  in  these  texts .  For  the  purpose  of 
this  thesis,  parallel  computing  and  parallel  processing  are 
assxamed  to  be  synonymous.  In  this  emerging  field,  there 
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exists  slight  differences  in  how  to  define  parallel  computing. 

One  definition  which  best  encompasses  the  breadth  of  the  field 

may  be  taken  from  (Hwang,  1984,  p.  6) : 

Parallel  processing  is  an  efficient  form  of  information 
processing  which  emphasizes  the  exploitation  of  concurrent 
events  in  the  computing  process .  Concurrency  implies 
parallelism,  simultaneity,  and  pipelining.  Parallel 
events  may  occur  in  multiple  resources  during  the  seime 
time  interval;  simultaneous  events  may  occur  at  the  saune 
time  instant;  and  pipelined  events  may  occur  in  overlapped 
time  spans .  These  concurrent  events  are  attaine±)le  in  a 
computer  system  at  various  processing  levels . 

In  his  book,  Hwang  describes  the  processing  levels  to 

be:  the  program  level,  the  task  level,  the  inter-instruction 

level,  and  the  intra-instruction  level.  The  program  level 

involves  executing  multiple  prograuns  by  means  of 

multiprogramming,  time  sharing,  amd  multiprocessing.  This 

level  is  concerned  with  the  design  of  parallel  processing 

systems  which  is  beyond  the  scope  of  this  thesis.  Therefore, 

for  the  purposes  of  this  paper,  the  definition  of  parallel 

computing  is  defined  as  the  efficient  form  of  information 

processing  emphasizing  the  concurrent  computations  and 

manipulation  of  data  to  solve  a  single  problem. 

2 .  Classification  of  Parallel  Computers 
a.  Type  Classifications 

Implicit  in  the  definition  of  parallel  computing 
are  three  methods  to  achieve  parallelism.  The  three  methods 
are  temporal  parallelism,  spatial  parallelism,  and 
asynchronous  parallelism  (Hwang,  1984,  p.  20)  .  These  methods 
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offer  a  manner  to  classify  the  various  types  of  parallel 
computers . 

The  first  type  is  a  pipeline  computer.  Pipeline 
computers  perform  overlapped  computations  to  exploit  temporal 
parallelism.  Computations  are  divided  into  a  number  of  stages 
or  segments  with  the  output  of  one  segment  being  the  input  of 
another.  Analogous  to  a  factory  assembly  line,  if  each 
segment  works  at  seune  speed,  the  work  rate  of  the  pipeline  is 
the  s\im  of  work  rates  of  the  segments .  The  maximum  work  rate 
is  achieved  once  the  pipeline  is  full .  An  example  of  a 
pipeline  computer  is  the  Cray— 1 . 

The  second  type  is  an  array  processor.  Array 
processors  use  multiple  synchronized  processing  elements  to 
achieve  spatial  parallelism.  Each  processing  element  performs 
simultaneously  identical  operations  on  different  data.  An 
example  of  an  array  processor  is  the  Connection  Machine. 

The  third  type  is  a  multiprocessor. 
Multiprocessors  may  achieve  asynchronous  parallelism  through 
a  set  of  interactive  processors  (nodes) .  These  processors  are 
capable  of  performing  independent  operations,  but  share 
resources  such  as  memory.  An  example  of  a  multiprocessor  is 
the  Cm*  of  Carnegie-Mellon  University. 

The  final  type  is  a  refinement  of  the 
multiprocessor,  the  multicomputer.  Multi computers,  like 
multiprocessors,  achieve  asynchronous  parallelism  through  a 
set  of  interactive  processors.  But  these  processors  each  have 
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their  ovm  local  memory.  An  excunple  of  a  multicomputer  is  the 
INTEL  iPSC  hypercube.  Because  each  processor  has  its  own 
memory  and  may  perform  independent  operations,  multi computers 
offer  the  user  an  added  degree  of  freedom  in  programming. 
However,  interaction  between  the  processors  (nodes)  may 
require  synchronization  to  be  explicitly  programmed  in  the 
multicomputer  code. 

The  four  type  classifications  are  not  necessarily 
mutually  exclusive.  Many  commercially  available  array 
processors,  multiprocessors,  and  multicomputers  employ 
pipeline  processors  to  complete  operations  such  as  vector 
processing. 

Jb.  Architectural  Classifications 

Parallel  computers  may  also  be  classified 
according  to  their  architecture.  One  scheme  for  classifying 
digital  computers  was  introduced  by  Michael  J.  Flynn  in  1966. 
He  introduced  a  scheme  to  clasrify  computers  into  four 
categories  based  on  the  multiplicity  of  instruction  and  data 
streams.  An  instruction  stream  is  a  sequence  of  instructions 
to  be  executed  by  the  computer.  Likewise,  a  data  stream  is  a 
sequence  of  data  used  by  the  computer,  Flynn's  four 
categories  are  (Flynn,  1966) : 

1.  Single  instruction  stream,  single  data  streeim  (SISD)  . 
Most  serial  computers  fall  in  the  SISD  category. 
Although  instructions  are  completed  sequentially,  this 
category  includes  overlapping  instructions  (pipelining) . 
Therefore,  pure  pipeline  processors  also  belong  to  this 
category . 
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2.  Single  instruction  streaun,  multiple  data  stream  (SIMD)  . 
Array  processors  fall  into  this  category.  The  array 
processor  receives  a  single  set  of  instructions,  but  each 
element  receives  and  manipulates  its  ovm  set  of  data. 

3.  Multiple  instruction  streaun,  single  data  streaim  (MISD)  . 
No  current  computers  fall  into  this  category.  This 
architecture  has  been  challenged  as  impractical  by  some 
computer  designers  (Hwang,  1984,  p.  34) . 

4.  Multiple  instruction  stream,  multiple  data  streaun  (MIMD)  . 
Most  multiprocessors  and  multicomputers  fall  into  this 
category.  The  INTEL  iPSC  is  a  MIMD  machine. 

c.  Topological  Clss3i£icotion8 

Another  classifying  scheme  for  parallel  computers 

is  by  the  topology  of  the  inter-processor  connections .  These 

connections  are  the  means  through  which  commun Section  between 

individual  processors  is  conducted.  This  classifying  scheme 

applies  only  to  array  processors,  multiprocessors,  and 

multicomputers.  Some  of  the  general  topologies  are  the  mesh, 

the  pyraunid,  the  butterfly,  and  the  hypercube.  Figures  3.1 

and  3.2  show  examples  of  the  mesh  and  hyperevibe  topologies. 

The  topology  may  also  be  customized  to  meet  specific  computing 

needs .  For  a  more  comprehensive  discussion  of  the  various 

topologies  see  (Quinn,  1987,  pp.  25  -  30) . 

3 .  Measurements  of  Performance 

With  faster  computation  speed  being  the  ultimate 
objective,  certain  measures  are  needed  to  determine  the 
effectiveness  of  parallel  computing  versus  serial  computing  to 
achieve  this  objective.  Computation  speed  depends  on  many 
factors  that  include  the  computer  hardware  design,  the 
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—  -  -  * 

Tigur«  3 . 1  Two'-dimensional  meshes 

technical  specifications  of  its  components,  emd  the  algorithm 
or  method  of  solution  used  to  complete  the  computations .  Two 
common  measures  of  effectiveness,  accounting  for  both  the 
hardware  amd  the  algorithm,  are  speedup  auid  efficiency. 

Speedup,  Sp,  refers  to  the  ratio  between  the  time 
taken  to  execute  a  set  of  computations  serially,  T,,  aind  the 
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Figure  3.2  Hypercubes  of  dimension  zero  through  four 

time  taken  to  complete  the  saune  set  of  computations  exploiting 
parallelism,  Tp, 


(3.1) 
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Although  speedup  compares  the  time  taken  for  the  serial 
computer  program  aind  the  parallel  computer  to  complete  the 
scune  set  of  computations,  this  set  does  not  imply  both 
prograuns  follow  the  same  algorithm.  Parallel  progreuns  often 
contain  additional  operations  to  accommodate  parallelism.  In 
order  not  to  be  misleading,  speedup  should  compare  the 
parallel  computer  program  with  the  most  efficient  serial 
computer  program.  Many  suggest  that  times,  Tp  and  T,,  be 
measured  using  a  particular  parallel  computer  and  the  fastest 
serial  computer.  However,  the  variation  in  the  technical 
specifications  of  both  computers  may  cloud  the  issue  whether 
parallel  processing  is  beneficial.  To  be  an  effective 
measure,  the  computing  technical  specifications  of  the 
individual  processor  of  the  parallel  computer  amd  the  serial 
computer  should  be  equal.  Therefore,  for  the  purpose  of  this 
thesis,  speedup,  Sp,  is  measured  by  the  ratio  of  time,  T^, 
taken  by  the  parallel  computer  executing  the  most  efficient 
serial  algorithm  and  the  time,  Tp,  taken  by  the  same  parallel 
computer  executing  the  parallel  algorithm  using  p  processors. 


The  other  measure,  efficiency,  accounts  for  the 
relative  cost  of  achieving  a  specific  speedup.  Relative  cost 
is  measured  as  the  number  of  processors  recpiired  to  achieve 
the  speedup.  Efficiency,  Ep,  is  the  ratio  between  the 


speedup,  Sp,  and  the  number  of  processors,  p  (the  theoretical 
speedup) . 


(3.3) 


Many  factors  could  possibly  limit  the  possible  speedup 
and  efficiency  of  a  parallel  program-  These  factors  include 
the  number  of  sequential  operations  that  cannot  be 
parallelized,  the  communication  time  between  individual 
processors,  and  the  time  each  processor  is  idle  due  to 
synchronization  requirements.  Many  have  argued  these  factors 
severely  restrict  the  benefits  of  parallel  computing.  Despite 
these  factors,  research  has  shown  parallel  computing  can  be  an 
effective  means  to  reduce  computation  time  (see  Quinn,  1987, 
pp.  18  -  20  and  Gustafson,  1988)  .  Considering  only  the  number 
of  sequential  operations  in  a  program  that  cannot  be 
parallelized,  Amdal's  Law  states  that  the  maximum  speedup,  Sp, 
achieveU&le  by  p  processors  is : 


P  f+(l-f)  /p 


(3.4) 


where  f  is  the  fraction  of  operations  that  must  be  performed 
sequentially  (Amdahl,  1967,  pp.  483  -  485).  Equation  3.3 
provides  an  initial  meains  to  determine  if  em  algorithm  is  a 
good  candidate  for  parallelization. 
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B.  INTEL  iPSC/2  HYPERCHBE 

To  maximize  speedup  and  efficiency,  parallel  algorithms 
must  be  developed  with  a  specific  parallel  computer  in  mind. 
In  determining  the  parallel  computing  potential  of  the 
NAVSPASUR  satellite  motion  model,  an  INTEL  iPSC/2  hypercube 
computer,  located  at  the  Department  of  Mathematics  at  the 
Naval  Postgraduate  School,  was  used.  The  iPSC/2  is  a  MIMD 
multicomputer  with  a  hypercube  topology.  The  iPSC/2  consists 
of  a  system  resource  manager  and  eight  individual  processors, 
called  computing  nodes .  The  system  resource  manager,  often 
called  the  host,  provides  the  interface  between  the  user  and 
the  computing  nodes.  The  host  is  a  386-based  computer,  which 
may  be  used  to  process  data  in  addition  to  providing  the 
interface  for  the  user. 

The  computing  nodes  are  complete,  self-contained  INTEL 

80386  microprocessors.  Each  computing  node  also  contains  a 

80387  numeric  coprocessor,  its  own  local  memory,  and  a  Direct- 
Connect  communications  module  (DCM) .  Each  computing  node  may 
be  augmented  by  a  Vector  Extension  (VX)  module  for  pipelined 
vector  operations.  The  iPSC/2  located  at  the  Naval 
Postgraduate  School  contains  only  one  node  with  the  VX  module . 

Communications  eunong  the  nodes  and  the  host  are  completed 
through  message  passing.  The  Direct-Connect  Module  (DCM) 
allows  messages  to  be  sent  directly  to  the  receiving  node 
without  disturbing  the  other  node  processors.  Other  hypercube 
designs  require  messages  to  be  stored  and  forwarded  along  a 
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path  of  connected  nodes  until  the  message  reached  the 
receiving  node . 

The  iPSC/2  uses  a  UNIX  operating  system  and  may  be 
progreunmed  in  Fortran  and  C  languages.  A  more  detailed 
listing  of  the  INTEL  iPSC/2  hyperciibe's  technical 
specifications  is  contained  in  Appendix  B. 

C.  METHODS  OF  PARALLELIZATION 
1 .  Vectorization 

Vector iza.t ion  is  one  method  to  parallelize  an  existing 
sequential  prograun.  Vectorization  is  the  process  of  converting 
blocks  of  sequential  operations  into  vector  instructions  that 
may  be  pipelined.  A  simple  example  of  vectorization  using 
Fortran  is  the  following: 

Secmential  Code: 

Do  10  i=l,N 
10  z (i) =x(i) +y(i) 


Vector  Code  (VAST2) : 

call  vadd{N, X, 1, y, If  z, 1) 

To  assist  in  the  vectorization  of  a  serial  program,  there 
exist  many  commercially-availaUble  vectorizing  compilers. 
(Quinn,  1987,  pp.  233  -  235) 

Vectorizing  compilers  automatically  vectorize 
sequential  program  code  for  execution.  Additionally,  they  may 
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identify  to  the  user  program  constructs  and  data  dependencies 
that  limit  potential  vectcrization .  Vectorization  cannot  be 
maximized  solely  by  a  compiler.  Moac  vectorizing  compilers 
have  a  limited  a±»ility  in  recognizing  sequential  blocks  to  be 
vectorized  and  translations  may  not  be  always  straight 
forward.  INTEL  iPSC/2  contains  the  vectorizing  compiler, 
VAST2  .  The  VAST2  compiler  supports  only  Fortran  progreims  and 
is  limited  to  vectorizing  only  do  loops  and  if  statements . 
(iPSC/2  VAST2  User's  Guide,  1989) 

2 .  Distributing  Computations 

By  parallelizing  tasks  on  individual  processors, 
Vectorization  provides  only  the  first  level  of  parallelism. 
In  order  to  partition  a  program  into  parallel  tasks  to 
distribute  among  the  processors  of  a  multi -computer,  a 
different  strategy  is  needed.  Although  there  exist  many 
commercially-available  vectorizing  compilers,  compilers  which 
identify  higher  levels  of  parallelism  have  not  been  as 
successful.  Therefore,  the  task  of  developing  an  algorithm  to 
efficiently  distribute  computations  eunong  several  processors 
is  left  to  the  user. 

Performance  of  parallel  algorithms  may  be  radically 
different  for  different  parallel  computers.  A  number  of 
factors  such  as  processor  speed,  memory  access  time,  and 
memory  capacity  can  affect  an  algorithm's  performance.  Hence, 
the  strategy  to  parallelize  an  algorithm  must  be  developed 
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with  a  specific  parallel  computer  in  mind.  The  multicomputer 
with  each  node  having  its  own  memory  provides  the  greatest 
flexibility  to  the  user.  For  a  multicomputer,  the  user  must 
partition  the  problem  among  the  processor  nodes.  The 
hypercube  topology  allows  the  user  to  use  the  natural  topology 
of  the  problem  to  decompose  the  problem  into  parallel 
processes .  A  process  is  defined  as  a  single  statement  or  a 
group  of  statements  which  are  a  self-contained  portion  of  the 
total  computations.  Using  the  INTEL  iPSC/2,  two  decomposition 
strategies  are  suggested  (iPSC/2  User's  Guide,  1990.  pp.  4-1  - 
4-6)  : 

•  Control  Decomposition 

•  Domain  Decomposition 

a .  Control  Decomposition 

Control  decomposition  is  the  strategy  of  dividing 
tasks  or  processes  among  the  individual  processors  (rodes) . 
This  strategy  incorporates  a  divide  and  conqpier  approach. 
Control  decomposition  is  recommended  for  problems  with 
irregular  data  structures  or  unpredictable  control  flows. 

One  method  of  control  decomposition  is  for  the 
parallel  prograun  to  self-schedule  tasks.  For  this  method  one 
node  assumes  the  role  of  a  manager  with  the  remaining  nodes 
assuming  roles  as  workers.  The  managing  node  maintains  a  list 
of  processes  to  be  accomplished  and  assigns  a  processes  to  the 
working  nodes.  The  working  nodes  request  jobs,  receive 
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processes,  and  perform  the  indicated  tasks.  Implied  in  the 
self-scheduling  method  is  the  cost  of  one  processor  to  perform 
the  manager  duties.  (iPSC/2  User's  Guide,  1990,  p.  4-4) 

A  second  method  of  control  decomposition  is  to 
pre-schedule  the  processes .  The  exact  tasks  required  of  each 
node  are  explicitly  stated  in  the  parallel  progrcun.  Although 
this  method  saves  the  cost  of  the  managing  node,  care  must  be 
taken  to  ensure  the  processes  are  evenly  distributed  cunong  the 
nodes . 

b.  Domain  Dacompoaition 

Domain  decomposition  is  the  strategy  of  dividing 
the  input  data  or  domain  aunong  the  nodes .  The  partitioned 
sets  of  domain  may  be  specific  data  sets  such  as  blocks  of 
matrix  or  represent  a  specific  grid  such  as  used  in  finite 
difference  or  finite  element  methods  to  solve  partial 
differential  equations.  The  major  difference  between  control 
and  domain  composition  is  that  domain  decomposition  strategy 
requires  each  node  to  perform  essentially  the  saune  tasks  but 
with  different  input  data. 

Domain  decomposition  is  recommended  if  the 
calculations  are  based  on  a  large  data  structure  and  the 
amount  of  work  is  the  saune  for  each  node.  An  exaunple  of 
domain  decomposition  is  multiplying  two  large  matrices  by 
block  multiplication.  Although  domain  decomposition  may  seem 
perfectly  parallelizable  and  thereby  very  efficient,  user  must 
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use  caution  to  ensure  each  input  data  set  requires  essentially 
the  same  2unount  of  work. 

3 .  Iiiproving  Performance 

The  decomposition  of  a  problem  may  require  the  use  of 
the  control,  domain  or  a  hybrid  of  both  strategies  to  be 
efficient.  Once  a  specific  strategy  is  chosen,  several 
factors  should  be  considered  to  improve  the  performance  of  the 
parallel  algorithm.  Those  factors  include: 

•  Load  balance 

•  Communication  to  computation  ratio 

•  Sequential  bottlenecks 

a .  Load  Balance 

Load  balance  refers  to  the  degree  to  which  all 
nodes  are  active.  If  the  work  is  not  evenly  distributed  among 
the  nodes,  the  parallel  algorithm  will  show  constrained 
speedup.  Load  balancing  may  be  achieved  by  decreasing  the 
grain  size  of  the  parallel  tasks,  self-scheduling  tasks,  or 
redistributing  the  domain.  Grain  size  refers  to  the  relative 
amount  of  work  completed  in  parallel.  Pipelined  vector 
operations  is  an  example  of  small  grain  parallel  computing  and 
distributing  computations  may  be  considered  as  large  grain 
parallel  computing. 

b.  Communication  to  confutation  ratio 
Communications  to  computation  ratio  is  the  ratio 

between  the  time  spent  communicating  and  the  time  spent 
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computing.  Except  for  perfectly  parallel  problems,  time  lost 
for  communications  is  inherent  i  n  parallel  algorithms .  A 
large  communication  to  computation  ratio  constrains  a  parallel 
program' s  performance .  The  objective  is  to  maximize  the  time 
a  node  spends  computing  and  to  minimize  the  time  spent 
communicating.  Reductions  in  the  communication  to  computation 
ratio  may  be  accomplished  by  increasing  the  grain  size, 
grouping  messages,  or  recalculating  values  instead  of 

receiving  the  value  from  another  node, 
c.  SBquantial  BottlanBcks 

Sometimes  tasks  cannot  begin  until  completion  of 
a  previous  task,  limiting  nrunber  of  tasks  that  can  be 

completed  in  parallel .  A  sequential  bottleneck  is  the 

circumstance  of  other  processors  waiting  for  another  processor 
to  complete  a  task  before  they  may  continue.  The  portion  of 
operations  that  are  not  completed  in  parallel  can 

substantially  restrict  speedup  as  can  be  seen  by  Amdahl's  Law 
(Equation  3.3) .  Inherent  in  sequential  bottlenecks  are  any 
requirements  of  the  nodes  to  synchronize.  The  only  method  to 
remove  sequential  bottlenecks  is  to'  modify  or  reorder  the 
algorithm  in  order  to  overlap  sequential  code  with  other 
computations . 
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IV.  PARALLELIZATION  OF  PPT2 


The  puirpose  of  this  research  was  to  determine  the 
potential  reduction  in  computation  time  for  the  NAVSPASUR 
satellite  motion  model  through  parallel  computing.  This 
potential  may  be  assessed  by  determining  the  relative  speedup 
and  efficiency  of  various  parallel  algorithms  employing  the 
methods  and  strategies  of  parallelization  discussed  in  Chapter 
III . 

As  stated  in  the  previous  chapter,  the  strategy  for 
developing  parallel  algorithms  depends  heavily  on  the 
architecture  and  topology  of  the  parallel  computer  used.  Due 
to  ease  of  access  and  fauniliarity  with  the  INTEL  iPSC/2 
hypercube,  the  parallel  computing  potential  "of  the  NAVSPASUR 
model  was  assessed  with  respect  to  implementing  the  model  on 
this  specific  multi-computer.  Although  performamce  of  various 
algorithms  may  vary  somewhat  depending  on  the  specific 
parallel  computer,  it  was  hoped  that  some  generalizations  may 
be  made  from  the  application  of  the  NAVSPASUR  model  to  this 
hypercube . 

A .  VECTORI ZATION 

The  first  method  of  parallelization  considered  for  the 
NAVSPASUR  model  was  vectorization .  Vectorization  is  usually 
simpler  than  the  other  methods  of  parallelization  to  apply. 
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Additionally,  if  vectorization  proved  to  be  beneficial,  it  may 
be  incorporated  with  the  other  parallel  computing  methods  in 
order  to  realize  even  greater  speedup  and  efficiency . 

The  realized  speedup  due  to  vectorization  is  a  function  of 
the  number  of  vector  operations  within  a  specific  algorithm. 
Vector  operations  in  Fortran  are  usually  characterized  by  do 
loops  containing  scalar  operations  performed  on  each  element 
of  an  array.  With  each  node  possessing  its  own  vector  co¬ 
processor  (VX  module)  ,  these  do  loops  may  be  replaced  by 
single  calls  to  canned  sxabroutines .  These  subroutines  utilize 
a  vector  co-processor  to  perform  pipelined  vector  operations, 
significantly  reducing  computation  time.  In  addition  to  the 
explicit  vector  operations  within  an  algorithm,  sometimes 
there  exist  blocks  of  scalar  operations  that  may  easily  be 
transformed  into  vector  operations.  Scalar  operations 
contained  within  Fortran  do  loops  and  logical  if  statements 
are  usually  good  candidates. 

Analysis  of  the  Fortran  subroutine  PPT2  shows  that  the 
current  subroutine  contains  very  few  explicit  or  implicit 
vector  operations .  The  only  apparent  vector  operation  in  the 
satellite  state  vector  prediction  portion  of  PPT2  is  the 
computation  of  the  velocity  vector  at  the  very  end  of  the 
algorithm.  The  propagation  of  the  orbital  element  set 
comprises  the  majority  of  the  computations.  The  formulas  used 
to  propagate  the  orbital  elements,  presented  in  Chapter  II, 
may  be  characterized  as  lengthy,  algebraically-complex,  non- 
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linear  scalar  functions  of  the  mean  orbital  elements . 


Attempts  to  transform  these  formulas  into  a  set  of  vector 
operations  quickly  become  algebraically  overwhelming  and  a 
successful  transformation  is  highly  improbadsle . 

Likewise,  with  the  exception  of  the  computation  of  the 
variaUble  T2,  the  scalar  operations  contained  in  the  do  loops 
and  if  statements  of  PPT2  demonstrate  limited  vectorizing 
potential  due  to  data  dependency  within  the  loops  and 
statements.  Therefore,  based  on  this  initial  assessment  of 
limited  vectorizing  potential,  vectorization  was  not 
considered  as  a  viable  method  to  reduce  computation  time  and 
efforts  to  vectorize  PPT2  were  pursued  no  further. 

B.  DISTRIBUTION  OF  COMPUTATIONS 

With  vectorization  deemed  as  not  a  viable  method  of 
parallel  computing  for  the  NAVSPASUR  model,  any  reduction  in 
computation  time  needed  to  be  achieved  through  the  method  of 
distributing  computations.  Both  strategies  of  control 
decomposition  and  domain  decomposition  were  considered. 

In  order  to  better  appraise  the  potential  reduction  of 
computation  time  by  implementing  each  strategy,  separate 
parallel  algorithms  utilizing  the  different  strategies  were 
developed  and  evaluated  with  respect  to  the  measures  of 
speedup  and  efficiency.  Although  a  combination  of  both 
strategies  may  possibly  provide  the  greatest  speedup  and 
efficiency,  the  evaluation  of  separate  algorithms  implementing 
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the  respective  strategies  exclusively  provided  a  better  means 
to  determine  how  the  majority  of  the  reduction  in  computation 
time  was  achieved.  Additionally,  once  the  relative  benefit  of 
each  strategy  is  determined,  it  would  not  be  difficult  to 
incoirporate  both  algorithms  together  on  the  hypercube . 

Using  the  two  distinct  strategies,  two  separate  sets  of 
programs  were  developed  and  evaluated.  Each  prograun  set 
consists  of  two  Fortran  prograuRS  to  be  executed  on  the  INTEL 
iPSC/2 .  A  host  (system  resource  manager)  program  acquires  a 
specified  size  cube;  loads  the  node  program  on  the  processors 
of  the  attached  c\abe;  and,  upon  completion  of  the  algorithm, 
releases  the  cube  for  another  application.  The  node  program 
implements  the  parallel  algorithm.  Although  the  host  may  also 
serve  as  an  additional  processor,  the  parallel  algorithms 
utilized  only  the  nodes  of  the  iPSC/2.* 

Program  set  named  P^T-4  implements  an  algorithm  using  the 
control  decomposition  strategy  and  the  progreim  set  named  P^T 
implements  an  algorithm  using  the  domain  decomposition 
strategy.  Descriptions  of  the  algorithms  and  an  assessment  of 
their  respective  results  are  contained  in  the  subsections 
below. 


’The  routine  used  to  determined  run  times  for  the  various 
programs  is  measured  differently  for  the  host  and  the  nodes. 
In  order  to  obtain  comparsible  times  to  compute  speedup  and 
efficiency,  the  actual  parallel  algorithms  utilize  only  the 
node  processors.  (iPSC/2  Progr2unmer' s  Reference  Manual,  1990, 
p.  3-174.) 
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1 .  Control  Decon^osition  —  P’T-4 

The  strategy  of  control  decomposition  is  to  reduce  the 
NAVSPASUR  model's  computation  time  by  the  concurrent 
completion  of  separate  tasks  (processes)  by  the  individual 
nodes  of  the  hypercube.  By  reducing  the  computation  time 
necessary  to  predict  each  individual  satellite's  state  vector, 
an  overall  reduction  in  computation  time  to  predict  state 
vectors  for  all  tracked  objects  can  be  achieved.  Hence,  the 
ultimate  objective  of  the  prograun  set,  P*T-4,  was  to  reduce 
the  computation  time  for  a  single  object  in  orbit . 
a .  Algorithm 

In  order  to  predict  a  satellite's  state  vector 
considering  the  secular  aind  periodic  correction  terms  due  to 
the  zonal  harmonics  and  a  correction  term  for  each  element  due 
to  the  sectoral  harmonics,  the  NAVSPASUR  model  requires  the 
completion  of  55  major  tasks.  The  majority  of  tasks  are 
evaluation  of  the  formulas  outlined  in  Chapter  II,  some  tasks 
are  a  group  of  computations  such  as  the  group  of  computations 
necessary  to  compute  the  variedjle  T2  or  the  group  of 
computations  to  solve  Kepler's  Equation  by  Steffensen' s 
Method . 

The  first  step  in  partitioning  these  tasks  among 
the  nodes  was  to  determine  which  tasks  could  be  completed 
concurrently.  Concurrency  was  determined  by  the  development 
of  a  hierarchy  of  the  formulas  used  by  the  NAVSPASUR  model . 
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Each  of  the  individual  tasks  were  listed  with  its  respective 
required  input .  Tasks  which  required  output  from  the 
completion  of  other  tasks  were  listed  below  those  tasks . 
Tasks  which  could  be  executed  concurrently  were  listed  on  the 
seune  level  of  the  hierarchy .  Figures  4 . 1  euid  4 . 2  contain  an 
extract  of  this  hierarchy . 

From  this  hierarchy  of  formulas,  the  number  of 
tasks  that  could  be  completed  concurrently  at  each  level  of 
the  hierarchy  ranges  from  2  to  14.  The  levels  where  only  a 
few  (less  than  four)  represent  potential  sequential 
bottlenecks .  These  sequential  bottlenecks  needed  to  be 
overcome  for  a  high  level  of  efficiency  to  be  achieved. 

Additionally,  the  number  of  FLOPS  required  varied 
considerably  among  the  tasks.  Some  tasks  required  as  few  as 
2  FLOPS,  while  other  tasks  required  over  200  FLOPS.  For 
exaunple,  solving  Kepler's  Equation  by  Steffensen' s  Method 
could  require  as  few  as  3  FLOPS  or  as  many  as  650  FLOPS  based 
on  the  speed  of  convergence.®  This  variance  in  the  number  of 
FLOPS  required  by  the  various  tasks  presented  a  potential 
problem  in  load  balancing. 

The  second  step  in  applying  this  strategy  was  to 
determine  the  method  of  scheduling  the  tasks  to  be 
accomplished  aunong  the  nodes.  A  manager-worker  algorithm  as 
described  in  Chapter  III  provides  an  easy  method  to  achieve 

*If  the  error  tolerance  of  less  than  10“®  is  not  met,  the 
NAVSPASUR  model  halts  Steffensen' s  Method  after  20  iterations. 
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Figure  4 . 1  Hierarchy  of  NAVSPASUR  Formulas 


load  balancing  aunong  the  nodes.  However,  for  a  large  number 
of  small  tasks  as  is  the  case  with  the  NAVSPASUR  model,  this 
type  of  algorithm  can  become  communication  intensive.  A  large 
communication— to-computation  ratio  cam  severely  limit  speedup 
and  could  possibly  cause  a  parallel  algorithm  run  longer  than 
the  original  serial  algorithm.  In  am  effort  to  reduce  the 
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Figur«  4.2  Hierarchy  of  NAVSPASUR  Formulas  (continued) 

eunount  of  commvmication  in  the  algorithm,  an  algorithm  that 
pre-scheduled  tasks  was  chosen. 

With  pre-scheduled  algorithms,  each  node  knows  its 
own  tasks  to  accomplish  without  communicating  with  a  "manager" 
node.  Additionally,  the  aJosence  of  a  "manager"  node  frees  one 
more  node  to  assist  in  completing  the  required  tasks .  Despite 
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these  positive  points,  pre-scheduled  algorithms  are  not 
without  their  own  drawbacks .  Pre-scheduling  algorithms  do  not 
provide  automatic  load  balancing  as  is  the  case  with  self¬ 
scheduling  algorithms .  Care  must  be  taken  to  ensure  each  node 
does  essentially  the  szune  aunount  of  work.  Also,  pre¬ 
scheduling  algorithms  require  a  fixed  number  of  nodes. 
Requiring  a  fixed  nximber  of  nodes  restricts  the  flexibility  of 
algorithm  amd  its  potential  speedup. 

The  third  step  in  applying  this  strategy  was  to 
determine  the  ntamber  of  nodes  for  pre-scheduling.  Factors  in 
determining  the  number  of  nodes  or  cube  size  include  the 
potential  requisite  speedup,  the  amount  of  computation 
completed  between  communication  messages,  potential  sequential 
bottlenecks,  the  number  of  messages  required,  and  efficient 
use  of  all  of  the  nodes .  In  an  attempt  to  achieve  an 
appreciable  speedup,  the  algorithm  was  developed  to  use  a 
minimum  of  four  nodes . 

The  final  step  in  applying  this  strategy  was 
assigning  specific  tasks  to  each  node.  Load  balancing  and 
potential  synchronization  problems  were  considered  in  the 
assignment  of  the  tasks  to  the  respective  nodes.  Often, 
communication  distances  are  also  considered  in  assigning  tasks 
to  the  nodes;  however,  with  such  a  small  number  of  nodes  the 
communication  distances  were  negligible.  The  diagram  in 
Figure  4.3  depicts  how  the  tasks  were  distributed  among  the 
four  nodes .  Some  of  the  smaller  initial  tasks  were  duplicated 
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by  several  nodes  in  order  to  limit  the  cimount  of  communication 
and  eliminate  potential  synchronization  problems  at  the 
beginning  of  the  algorithm.  A  complete  listing  of  the  source 
co^e  for  progrcuii  set,  P^T-4,  is  contained  in  Appendix  C. 

b .  Assessment 

To  estaJDlish  a  baseline  to  compare  the  performance 
of  the  parallel  program  sets  with  the  serial  subroutine  PPT2, 
execution  times  for  PPT2  to  predict  the  position  of  a  single 
satellite  and  a  set  of  satellites,  ranging  from  12  to  2C736, 
were  measured.  The  measured  times  were  the  elapsed  time  for 
the  execution  of  PPT2  in  milliseconds  on  a  single  node  using 
the  node's  inter:. al  clock.  Using  ten  different  sets  of 
satellite  data,  the  mean  execution  time  for  propagating  a 
single  satellite  was  11.2  milliseconds.  The  mean  execution 
time  for  PPT2  to  propagate  12  to  20736  satellites  is  depicted 
in  Figure  4.4.  These  mean  execution  times  were  used  to 
compute  the  speedup  and  efficiency  of  both  parallel  program 
sets . 

(1)  Results .  Program  set  P^T-4  was  executed  with 
the  same  sample  satellite  sets  as  were  used  with  PPT2 .  The 
graph  in  Figure  4.5  showt  a  comparison  of  the  mean  executions 
of  P’T-4  and  PPT2  for  a  various  number  of  satellites .  As  one 
can  see,  P^T-4  was  nearly  two  times  slower  than  the  original 
serial  sxibroutine. 
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Figurm  4 . 3  P^T-4  Algorithm 


For  a  single  satellite,  the  meeui  execution  time 
for  P^T-4  was  23.3  milliseconds  as  compared  to  only  11.2 
milliseconds  for  PPT2 .  A  closer  look  at  where  the  time  is 
spent  reveals  the  shortcomings  of  this  parallel  algorithm. 
Table  4 . 1  shows  a  comparison  of  mean  execution  times  for  the 
one  node  executing  PPT2  and  the  four  nodes  executing  P^T-4 . 
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The  times  expressed  in  the  table  are  the  mean  for  the  ten 
sauaple  satellites.  The  communication  time,  t,,  includes  the 
time  spent  sending  and  receiving  messages,  plus  the  time  spent 
waiting  for  messages  to  arrive.  The  computing  time,  t,,,  is 
the  time  each  node  spent  completing  its  respective  tasks. 
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As  seen  by  the  times  listed  in  Tadale  4.1,  two 
problems  with  the  algorithm  become  evident.  First, 
communication  time  outweighs  the  actual  computation  time  for 
each  node .  The  causes  of  the  long  communication  time  are 
number  of  messages  required  by  this  specific  partition  of 
tasks  and  synchronization  problem  of  nodes  waiting  to  receive 
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Table  4 . 1  P*T-4  Execution  Time  Breakdown 


Algorithm 

te 

(milliseconds) 

(milliseconds) 

t 

(milliseconds) 

PPT2 

(one  node) 

11.2 

NA 

11.2 

P’T-4 

node  0 

4.3 

19.0 

23.3 

node  1 

2.2 

15.9 

18 . 1 

node  2 

2.7 

14.7 

17.4 

node  3 

5.8 

15.7 

21.5 

computed  values  from  other  nodes .  Second  and  most 
importantly,  although  the  actual  computation  time  was  reduced, 
the  total  execution  time  of  the  parallel  algorithm  is  longer 
than  the  serial  algorithm  implemented  by  PPT2. 

(2)  Improvements .  The  major  source  of  the  problem 
is  the  communication  to  computation  ratio.  This  parallel 
algorithm  using  four  nodes  requires  23  messages  among  the 
nodes.  The  NAVSPASUR  model  is  not  computationally  intensive 
enough  to  offset  this  amount  of  communication.  To  improve 
performance  this  ratio  must  be  reduced. 

One  method  to  reduce  the  communication  to 
computation  ratio  is  to  reduce  the  aunount  of  communication. 
One  way  to  reduce  the  number  of  communications  is  to 
restructure  the  partitions.  However,  other  partitions  using 
four  nodes  were  analyzed,  yet  none  could  significantly  reduce 
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the  nximber  of  messages.  One  alternative  to  possibly  achieve 
any  speedup  was  to  partition  the  computations  among  fewer 
nodes.  The  diagrami  in  Figure  4.6  depicts  the  distribution  of 
tasks  for  a  two  node  algorithm  P^T-2.  Although  the  algorithm 
displayed  potential  in  reducing  the  total  execution  time  to 
less  than  PPT2,  speedup  would  be  further  bounded  by  two.  A 
speedup  of  two  would  not  outweigh  the  costs  in  procuring  a 
parallel  computer  merely  for  satellite  propagation. 

A  second  alternative  for  reducing  the 
communication  to  computation  ratio  is  to  somehow  increase  the 
aunount  of  computation  between  messages.  The  aunount  of 
computation  could  be  increased  by  computing  the  intermediate 
values  for  n  satellites  in  an  array  and  sending  the  array  in 
one  message.  The  communication  would  remain  essentially 
constant  and  the  computation  between  messages  would  increase 
by  a  factor  of  n.  An  estimate  of  this  improvement  may  be  made 
for  the  mean  times  in  Tcdile  4.1  using  speedup  and  efficiency. 
From  Chapter  III,  efficiency  is  expressed  as  the  following: 


where 


p  pt  <p) 


(4.1) 


t  (p) =t^(p) +t,(p)  (4.2) 

If  E„  is  the  efficiency  of  the  P^T-4  algorithm  computing  n 
satellites'  values  between  messages,  then 
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Figur*  4 . 6  P^T-2  Algorithm 


^  ^  '  nt(l) _ 

"  p(nt,(p) +t.(p)  ) 


(4.3) 


Solving  Equation  4.1  for  t(l)  auxd  substituting  into  Ecpiation 
4 . 3  yields 


65 


Replacing  E  by 


t(l) 


(4.6) 


Take  the  limit  as  n  goes  to  infinity  and  the  upper  bound  for 
E„  is 


"  Ptc(P) 


(4.7) 


Setting  p  equal  to  four  and  using  values  from  Figure  4.1,  E„ 
is  bounded  by  .48.  This  implies  the  maximum  speedup  of  the 
modified  algorithm  would  be  bounded  by  1.92.  Again,  the 
benefit  of  using  this  strategy  is  quite  limited. 

2 .  Domain  Decon^osition  —  P^T 

The  strategy  of  domain  decomposition  is  to  reduce  the 
NAVSPASUR  model's  computation  time  by  the  concurrent 
computation  of  several  satellites'  state  vectors.  Each  node 
of  the  hypercvibe  would  complete  identical  tasks  on  different 
satellite  data  sets,  simultaneously.  Hence,  the  ultimate 
objective  of  program  set  P’T  was  to  reduce  the  overall 
computation  time  for  several  objects  in  orbit. 


a.  Algorithm 

Unlike  the  application  of  the  control 
decomposition  strategy,  the  application  of  the  domain 
decomposition  strategy  to  the  NAVSPASUR  model  was  seemingly 
less  arduous.  First,  because  each  node  propagates  satellite 
data  sets  independent  of  the  other  nodes,  there  exists  no 
requirement  for  communication  or  synchronization  among  the 
nodes.  This  lack  of  communication  simplifies  the  load 
balancing  and  sequential  bottleneck  problems  present  in  the 
P’T-4  parallel  algorithm. 

Second,  because  each  node  may  perform  the 
satellite  state  vector  prediction  tasks  serially,  the  existing 
subroutine  PPT2  may  be  used  with  only  minor  modifications . 
Developing  a  parallel  algorithm  for  predicting  an  individual 
satellite' s  state  vector  was  a  major  task  for  the  control 
decomposition  strategy.  Additionally,  by  using  the  existing 
PPT2  code,  the  other  tasks  completed  by  PPT2  may  be  requested 
by  the  user  using  the  seune  control  varieQjles  as  used  by  the 
original  PPT2  subroutine .  The  P’T-4  program  set  was 
restricted  to  only  predicting  a  satellite's  state  vector. 

Finally,  by  using  the  serial  subroutine  PPT2,  this 
strategy  may  be  reduced  to  only  developing  an  algorithm  to 
distribute  the  data  in  a  timely  manner.  Maximtim  efficiency 
will  be  achieved  if  the  nodes  do  not  have  to  wait  for 
satellite  data  to  propagate. 
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Intuitively,  this  strategy  seems  perfectly 
parallelizeible .  Although  the  various  tasks  performed  by  PPT2 
require  different  computation  times,  the  total  execution  time 
for  each  node  will  be  essentially  the  same  if  it  is  assumed 
that  the  various  tasks  are  randomly  distributed  throughout  the 
input  data  sets.  The  concern  for  this  algorithm  was  the 
potential  sequential  bottlenecks  at  input/output  portions  of 
the  program  set.  Reading  and  writing  to  external  files  can  be 
very  time  consuming.  In  addition  to  the  actual  time  spent 
reading/writing  to  an  external  file,  a  certain  amount  of  time 
is  spent  to  access  the  file.  In  order  to  minimize  this  time, 
the  number  of  calls  to  read/write  to  a  file  should  be 
minimized. 

With  the  specific  iPSC/2  hypercube  available, 
input/output  is  completed  sequentially.  Each  node  must 
compete  with  the  other  nodes  to  read  and  write  to  external 
files.  To  minimize  time  lost  to  accessing  the  file  cataloging 
the  set  of  satellites,  a  node  was  devoted  to  both  the 
reading/distributing  of  input  satellite  data  and  to  the 
collecting/writing  of  the  results.  The  idea  of  using  a  single 
node  to  read  the  data  and  a  single  node  to  subsequent  write 
the  output  is  simple  to  implement  and  proved  to  be  fastest 
method  to  overcome  the  bottlenecks  with  the  input/output .  The 
remaining  nodes  of  the  hypercube  implement  the  NAVSPASUR  model 
using  a  slightly  modified  PPT2 .  The  diagram  in  Figure  4.5 
depicts  how  the  satellite  data  is  distributed.  The  cost  of 
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using  this  simple  algorithm  to  distribute  and  collect  the  data 
is  the  loss  of  two  nodes.  The  only  restriction  on  the  size  of 
the  hyperctibe  required  by  P^T  is  that  the  attached  ciobe  must 
contain  at  least  four  nodes  to  achieve  any  speedup.  A 
complete  listing  of  the  source  code  for  program  set,  P^T,  is 
contained  in  Appendix  D. 

b .  Assessment 

(1)  Results.  The  graph  in  Figure  4.8  depicts  the 
mean  execution  time  for  P^T  versus  the  nvimber  of  satellites 
propagated  using  hypercubes  of  four  and  eight  nodes 
respectively.  P^T  was  successful  in  reducing  the  overall 
execution  time  to  propagate  several  satellites.  TaJole  4.2 
shows  the  speedup  and  efficiency  of  P^T  for  a  various  nxjmber 
of  satellites.  As  seen  in  Table  4.2,  the  speedup  achieved 
using  all  eight  nodes  of  the  hypercube  was  approximately  three 
times  larger  than  the  speedup  achieved  using  four  nodes .  With 
this  parallel  algorithm  using  six  "working"  nodes  for  an  eight 
processor  hypercube  and  only  two  "working"  nodes  for  a  four 
processor  hypercube,  an  increase  in  speedup  by  approximately 
a  factor  of  three  was  expected.  More  noteUDle  was  the  increase 
in  efficiency  using  eight  versus  four  nodes.  The  efficiency 
increased  from  .45  to  .67.  This  increase  in  efficiency 
indicates  that  P’T  applied  to  a  hypercube  of  greater  dimension 
could  yield  even  greater  speedup  and  efficiency. 
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Figura  4 . 7  P^T  Algorithm 


Tadjle  4.2  also  indicates  that  P^T  performamce 
increased  somewhat  with  an  increase  in  the  total  number  of 
satellites  propagated.  Because  with  this  parallel  algorithm 
the  computation  to  communication  ratio  does  not  vary  with  the 
number  of  satellites,  this  small  increase  in  performamce  must 
be  primarily  due  to  the  diminishing  impact  of  the  algorithm' s 


Figur*  4 . 8  P^T  Execution  Times 


overhead  on  total  execution  time.  This  overhead  includes  one 
additional  message  containing  the  total  nxomber  of  satellites 
to  propagate  from  the  distributing  node  to  the  other  nodes; 
some  small  computations  by  working  nodes  to  determine  number 
of  data  sets  to  receive;  and  a  halting  message  sent  by  the 
collecting  node  to  the  host  once  all  of  the  nodes  are 
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Table  4.2  P^T  Performance  (satellite  position  prediction) 


P’T 

Number  of  Satellites 

Sp 

Eo 

8  nodes 

20736 

5.31 

.66 

1728 

5.37 

.  67 

144 

5.24 

.  66 

12 

4.13 

.52 

4  nodes 

20736 

1.78 

.45 

1728 

1.80 

.45 

144 

1.79 

.45 

12 

1.66 

.33 

finished.  Because  these  additional  messages  and  computations 
are  only  completed  once  in  the  program,  the  time  cost 
associated  with  this  overhead  becomes  negligible  as  the  number 
of  satellites  propagated  is  increased.  The  speedup  and 
efficiency  remained  fairly  constant  for  greater  than  144 
satellites . 

To  estimate  the  impact  of  increasing  the  eunount  of 
computation  on  P’T' s  speedup  and  efficiency,  the  execution 
time  to  predict  a  satellite's  position  and  compute  the  partial 
derivatives  of  the  orbital  elements  was  also  measured  for  PPT2 
and  P’T.  These  results  are  summarized  in  Table  4.3.  Both 
speedup  and  efficiency  improved  with  this  increase  in 
computation . 
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Table  4.3  P^T  Performance  (satellite  position  prediction  plus 
cor’putation  of  partial  derivatives) 


P’T 

Number  of  Satellites 

Sp 

Ep 

8  nodes 

20736 

5.48 

.  69 

1728 

5.53 

.  69 

144 

5.45 

.  68 

12 

4.82 

.  60 

4  nodes 

20736 

1 . 84 

.46 

1728 

1.86 

.  47 

144 

1 . 96 

.47 

12 

1.82 

.46 

(2)  Improvements .  The  performance  results  of  this 
algorithm  using  only  four  and  eight  nodes  indicated  a 
potential  increase  in  both  speedup  and  efficiency  if  this 
algorithm  could  be  applied  to  a  hypercube  of  greater 
dimension.  Because  the  number  of  working  nodes  is  not  fixed 
for  this  algorithm,  P^T  could  be  applied  easily  to  any  size 
hypercube  with  no  modifications.  The  efficiency  of  the 
algorithm  should  increase  with  the  cube  dimension  until  the 
time  to  distribute  a  separate  satellite  data  to  each  working 
node  exceeds  the  time  required  by  node  to  propagate  a  single 
satellite.  Therefore,  a  possible  improvement  in  the 
algorithm' s  performance  can  be  achieved  by  applying  the 
algorithm  to  an  optimal  dimension  hype r cube . 
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Because  the  hypercvibe  at  the  Naval  Postgraduate 
School  is  limited  to  eight  nodes,  a  model  was  used  to  estimate 
the  optimal  hypercxibe  dimension.  The  total  execution  time  for 
P^T  to  propagate  n  satellites  with  p  processors,  t (p) ,  can  be 
modeled  by  the  following  expression: 

t  (P)  =fc»j  (P) +t„^(P) +tc(P)  (4.8) 

where  t^^  (p)  is  the  time  the  last  node  must  wait  to  receive  its 
first  satellite  data  set,  t^2  (P)  is  total  time  the  last 
node  must  wait  to  receive  all  of  its  svibsequent  satellite  data 
sets,  and  t^(p)  is  the  time  for  each  node  to  propagate  each 
share  of  the  n  satellites. 

As  described  in  previous  chapter,  the  iPSC/2  uses 
a  Direct-Connect  Module  (DCM) .  This  module  provides  an 
essentially  constant  startup  time  for  a  message  to  be  passed 
between  two  nodes  regardless  of  the  length  of  the  message 
path.  Hence,  the  time  to  send  a  message  between  two  nodes  is 
a  function  of  only  the  sir.e  (number  of  bytes)  of  the  message. 
Because  all  messages  between  the  distributing  node  and  working 
nodes  are  of  a  constant  size  (674  bytes) ,  the  time  of  a  single 
message  between  the  distributing  node  and  each  working  node  is 
essentially  constant.  For  this  algorithm,  there  are  p-2 
working  nodes .  Denoting  the  time  to  send  a  single  message 
between  the  distributing  node  and  a  working  node  as  t„(l)  ,  the 
hwi  (p)  may  be  modeled  by  the  following: 


74 


ip)  =  [  (P-2)  -1]  t_(l)  =  (p-3)  (1) 


(4.9) 


In  order  to  determine  t.  (1),  several  experiments  were  run 
using  the  specific  iPSC/2  located  at  the  Naval  Postgraduate 
School.  The  mean  value  of  t„(l)  was  approximately  .693 
milliseconds . 

A  working  node's  total  wait  time  for  svibsequent 
satellite  data  sets,  t^2(p)^  is  a  function  of  the  elapsed  time 
for  the  working  node  to  propagate  a  single  satellite,  the 
elapsed  time  for  the  distributing  node  to  send  a  subsequent 
satellite  data  set  to  the  working  node,  and  the  number  of 
satellites  the  working  node  must  propagate.  Because  the 
distributing  node  distributes  the  data  while  the  working  nodes 
are  computing,  the  wait  time  is  zero  if  the  subsequent 
satellite  data  arrives  before  the  working  node  is  ready  to 
receive.’  If  the  subsequent  satellite  data  arrives  after  the 
node  is  finished  with  the  previous  satellite,  the  wait  time  is 
the  difference  between  the  computing  time  for  the  previous 
satellite  and  the  elapsed  time  for  the  distributing  node  to 
send  the  node  another  satellite  data  set .  Because  the 
distributing  node  must  send  a  data  set  to  each  of  the  other 
working  nodes  prior  to  sending  a  sxibsequent  data  set  to  the 
last  node,  the  elapsed  time  for  the  distributing  node  to  send 
another  data  set  may  be  also  modeled  by  t^i  (p)  in  Equation  4.9. 

’if  a  node  is  not  ready  to  receive  a  message  from  another 
node,  the  message  is  stored  in  a  local  buffer.  The  time  to 
read  from  this  buffer  is  negligible. 
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The  total  wait  time  is  then  the  wait  time  for  each  subsequent 
satellite  data  set  multiplied  by  the  nvimber  of  data  sets 
received  by  each  working  node.  Hence,  by  letting  tl  represent 
the  time  to  propagate  a  single  satellite,  t^2  (P)  t>e  modeled 
by  the  expression: 


0  ift^^{p)<tl 

■  (^-1)  (t,,(p) -tl)  if  t^,{p)>tl 

p-2 


(4.10) 


Assuming  the  time  for  one  node  to  propagate  n 
satellites,  t(l),  is 


t(l)  =n(tl) 

The  total  computation  time  for  each  working  node,.  to(p),  may 
be  approximated  by  the  continuous  function: 


(4.11) 

p-2 

Substituting  Equation  4.8  into  Equations  3.2  and 
3.3,  the  speedup  and  efficiency  using  a  total  of  p  processors 
may  be  modeled  by  the  following  expressions : 


5  ^  t(l)  ^ _ n(tl) _ 

^  tip)  (p) +t^2(p) +t„(p) 

E  _ n(tl) _ 

P  P[tvi(P)  +tc(P)  ] 


(4.12) 


Setting  tl  equal  to  11.2  milliseconds  and  t„(p)  equal  to  .693 
milliseconds,  t (p) ,  Sp,  and  Ep  were  computed  using  Equations 
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4.9  -  4.12.  Figures  4.9  and  4.10  depict  the  estimates  of 
t  (p)  ,  Sp,  and  Ep  for  propagating  1728  satellites  using  4  to 
1024  processors  (a  cube  dimension  of  2  to  10)  .  Using  the 
above  model,  P^T  is  capeible  of  achieving  a  maximum  speedup 
over  16  with  a  corresponding  efficiency  of  approximately  90 
percent  for  a  hyperciibe  of  dimension  5  (2®  nodes)  .  A 
hyperciobe  of  dimension  4  achieves  a  speedup  of  nearly  14  and 
an  efficiency  of  almost  90  percent.  Although  these  graphs  are 
only  estimates  of  the  actual  values  of  speedup  and  efficiency, 
they  correspond  closely  to  the  actual  timed  results  for  four 
and  eight  node  size  hypercubes  and  provide  a  good  indication 
of  the  parallel  computing  potential  of  this  algorithm  for 
higher  dimension  hypercubes . 

Another  possible  improvement  to  this  domain 
decomposition  algorithm  is  to  eliminate  the  need  for  the 
distributing  and  collecting  nodes.  Although  the  iPSC/2 
located  at  the  Department  of  Mathematics,  Naval  Postgraduate 
School  is  not  capaible  of  concurrent  input  and  output, 
concurrent  file  systems  are  available.  Separate  I/O  nodes 
allow  the  computing  nodes  of  a  hypercube  to  concurrently  read 
and  write  to  external  files.  A  concurrent  file  system  would 
eliminate  the  need  of  the  distributing  and  collecting  nodes . 
Additionally,  the  INTEL  Concurrent  File  System  (CFS)  allows 
for  a  common  file  pointer  to  be  maintained  aunong  the  computing 
nodes,  minimizing  overhead  in  algorithm.  Depending  on  the 
efficiency  of  the  I/O  nodes,  a  further  increase  in  the  overall 
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Figure  4.9  Estimated  Execution  Time  of  P^T  for  Various 
Hypercube  Sizes 

algorithm  could  be  expected.  For  further  information  on  the 
INTEL  Concurrent  File  System  see  (iPSC/2  User's  Guide,  1990, 


7-1  -  7-18) . 
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V.  CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER  RESEARCH 


The  ultimate  objective  of  this  thesis  is  determine  the 
parallel  computing  potential  of  the  NAVSPASUR  satellite  motion 
model.  From  the  results  given  in  Chapter  IV,  vectorization 
and  control  decomposition  strategies  proved  not  beneficial  in 
significantly  reducing  the  computation  time  of  the  NAVSPASUR 
model.  Very  few  apparent  vector  operations  exist  in  the 
model,  and  any  attempt  to  transform  the  formulas  into  vector 
operations  became  algebraically  overwhelming.  Although  the 
analytic  formulas  of  the  NAVSPASUR  model  are  qpaite  lengthy  and 
complex,  they  proved  to  be  not  computationally  intensive 
enough  to  warrant  decomposition  of  the  algorithm  by  tasks. 

On  the  other  hand,  the  domain  decomposition  strategy 
showed  promise  if  the  satellites  are  propagated  in  a  batch 
mode.  The  P^T  algorithm  was  simple  to  apply.  The  algorithm 
provided  the  flexibility  to  vary  the  dimension  of  the 
hypercube  and  to  easily  modify  the  model  itself.  Although 
only  a  maximum  efficiency  of  .67  was  achieved,  the  potential 
efficiency  was  artificially  bounded  by  the  number  of  nodes 
availsdDle  with  the  specific  hypercube  multi-computer  used. 
Having  a  maximum  of  only  eight  nodes  availe±)le,  the  efficiency 
of  P^T  is  bounded  aJaove  by  .75.  Using  the  model  of  P^T's  total 
execution  time  described  in  Chapter  IV,  it  was  shown  that  a 
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maximum  efficiency  of  just  under  90  percent  could  be  achieved 
with  a  hypercube  consisting  of  16  nodes.  The  corresponding 
speedup  factor  of  nearly  14  would  significantly  reduce  the 
time  to  predict  the  state  vectors  for  several  thousand 
satellites . 

The  success  of  P’T  manifests  several  possible  areas  of 
future  research.  One  area  would  be  to  apply  P^T  to  higher 
dimension  cvibes  and  validate  the  estimates  of  speedup  and 
efficiency  using  more  than  eight  nodes.  Because  the  number  of 
working  nodes  is  not  fixed  for  this  algorithm,  P^T  could  be 
applied  easily  to  any  size  hypercube  with  no  modifications. 
Once  the  optimal  size  is  found,  one  could  attach  several  sub¬ 
cubes  of  the  optimal  dimension  and  determine  the  benefit  of 
propagating  several  smaller  catalogs  of  satellite  data  instead 
of  propagating  one  large  catalog. 

Another  possible  area  of  research  would  be  to  modify  the 
current  satellite  motion  model  to  increase  the  accuracy  of  its 
predictions .  The  results  in  the  previous  chapter  showed  an 
increase  in  performance  of  the  P’T  if  the  eunount  of 
computation  was  increased.  Hence,  greater  accuracy  could  be 
achieved  in  far  less  time  using  the  P*T  algorithm  than  the 
time  using  the  original  serial  algorithm.  Additionally,  from 
these  results,  the  parallel  computing  potential  of  satellite 
motion  models  that  are  more  computationally  intensive  would  be 
greater.  For  exeunple,  semi-analytic  models  which  combine  the 
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benefits  of  analytic  and  numerical  models  might  be  good 
candidates . 

Overall,  the  main  lesson  learned  from  this  thesis  is  that 
satellite  position  prediction  can  be  made  more  timely  through 
parallel  computing.  Although  the  best  method  of 
parallelization  might  vary  depending  on  the  specific  model 
used,  parallel  computing  is  a  vizible  option  to  achieve  timely 
satellite  position  prediction  for  the  growing  niimber  of 
objects  in  orbit  around  the  Earth. 
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APPENDIX  A 


INPUT/OUTPOT  OF  PPT2 


This  appendix  contains  a  listing  of  the  primary  variables 
used  by  NAVSPASUR  subroutine,  PPT2.  Tediles  A.l  -  A.  4  are 
extracted  from  (Solomon,  1991,  pp.  12  -  14)  .  TaJ^le  A. 5  was 
interpreted  from  the  NAVSPASUR  source  code. 

Table  A.l  PPT2  Calling  List 


VarieUale 

Definition 

Input 

Output 

IND 

control  variable 

I 

TM 

time 

I 

0 

KZ 

0  for  inertial  coordinates 

1  for  Earth-fixed 

I 
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Table  A.  2  PPT2  Input  Control  Varicibles 


Varieible 


KF(3) 


KF(2),  KF(4) 
KF  (10) 


Option 


Position,  no  partial  derivatives 
Position,  partial  derivatives 
Secular  recovery 
Epoch  update 


Inertial  coordinates 
Earth-fixed  coordinates 


Secular  and  drag  corrections  only 

Secular,  drag,  and  periodic  corrections 

Secular,  drag,  periodic,  and 
sectoral/tesseral  corrections 


Not  used  by  PPT2 


Table  A.  3  PPT2  Conunon  Blocks 


Block 

Variable 

Description 

Input 

Output 

CONS 

A(64) 

constants  set  by 
siibroutine  CONSl 

I 

PPT 

F(25) 

stored  mean  elements 

I 

0 

OSC (10) 

osculating  elements 

0 

KF  (10) 

control  varic±>les 

I 

0 

CF (10) 

used  for  appearance 
prediction 

I 

0 

BS(3,4) 

observation  stations 
position  (R,  2:,6,A) 

I 

U(3) 

t 

0 

V(3) 

•O' 

0 

W(3) 

W 

0 

R 

r 

0 

VEL (3) 

f,  velocity 

0 

DCSUB 

PE (6, 8) 

partial  derivatives 

0 

.  .  . 

others  not  used  in 

PPT2 

FOREO 

RHO (3) 

P 

0 

RHOS 

P' 

0 

HDR 

0 

HDV 

4-  f 

0 

RDV 

p-r 

0 

DEL 

Af 

0 

ITER 

iteration  counter 

0 
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Table  A. 4  Array  F 


Index 

Variable 

Definition 

1 

1" 

mean  mean  anomaly 

2 

m 

mean  motion 

3 

Ma 

first  decay  term 

4 

M3 

second  decay  term 

5 

e" 

eccentricity 

6 

g" 

mean  argument  of 
perigee 

7 

h" 

mean  ascending  node 

8 

cos  I" 

cosine  inclination 

9 

t 

epoch 

10 

# 

revolution  number 

11 

dg'Vdl" 

12 

dh'Vdl" 

13 

a" 

mean  semi-major  axis 

14 

& 

15 

sin  I" 

sine  inclination 

16 

17-25 

not  used  in  PPT2 
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Table  A. 5  Array  OSC 


Index 

Variable 

Definition 

1 

cosine  true  anomaly 

2 

sin  f 

sine  true  anomaly 

3 

1 

osculating  mean  anomaly 

4 

e 

osculating  eccentricity 

5 

g 

osculating  argument  of  perigee 

6 

h 

osculating  ascending  node 

7 

cos  I 

cosine  osculating  inclination 

8 

a 

osculating  semi-major  axis 

9 

Al 

secular  and  drag  correction  term  for  1 

10 

not  used  by  PPT2 
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APPENDIX  B 


INTEL  iPSC/2  SPECIFICATIONS 

This  appendix  contains  a  sununary  of  the  iPSC/2  hypercube 
multi-computer  specifications  as  described  in  (iPSC/2  User's 
Guide,  1990,  pp.  1-1  -  1-11) .  The  exact  performance  values 
were  obtained  from  (Arshi,  1988,  pp.  17  -  22) . 

iPSC/2 


System  Resource  Manager 
Central  Processing  Unit 
Numeric  Processing  Unit 
Memory 

External  Communication 

Operating  System 


(Host) 

INTEL  80386  (4  MIP) 

INTEL  80387  (250  KFLOP  64-bit) 
8  Mbyte 

Ethernet  TCP/IP  local  area 
network  port 

AT&T  UNIX,  Version  V,  Release 


3.0 


Nodes 

Node  Processor 
Numeric  Co-processor 
Operating  System 


INTEL  80386  (4  MIP) 

INTEL  80387  (250  I^LOP  64-bit) 
NX/2 
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Internal  Communication 

Direct-Connect™  Routing  Message  Latency  —  350  |i.sec 

Message  Bandwidth  —  2.8 
Mbytes/sec 


89 


APPENDIX  C 


P^T-4  SOtTRCE  CODE  LISTING 

This  appendix  contains  the  listing  of  host  and  node 
programs  comprising  P®T-4  prograun  set.  The  host  program  reads 
the  satellite  data,  loads  the  node  program,  and  writes  the 
results  to  an  external  output  file.  The  node  program  contains 
the  instructions  for  the  four  nodes  to  complete  their 
respective  portions  of  the  P*T-4  parallel  algorithm.  The  node 
program  is  limited  to  only  satellite  state  vector  prediction . 


Host  Program: 


program  P3T4h 

This  host  program  reads  the  satellite  data  from  a 
external  file,  loads  the  program  P3T-4n  on  the  hypercube 
nodes,  and  writes  results  to  external  output  file. 

implicit  real*8  (a-h,o-z) 

dimension  sat (49, 6000) , end (49, 1) 
integer  pid,msglen, eof 

data  pid/0/,msglen/392/ 
data  isat/1/ 

call  setpid(pid) 

Load  node  program  P3T4n  on  nodes 

print*, 'host  loading  nodes' 
call  load (' p3t4n' , -1, pid) 

Read  satellite  data  and  send  data  to  nodes  until  reach 
end-of-f ile 

open (unit"10, file—' /usr/phipps/in4' , form—' unformatted' ) 
read  (unit— 10, iostat— eof )  (sat ( j, 1) , j— 1, 49) 

20  if  (eof.ge.O)  then 

call  csend (5, sat (1, isat) ,msglen, -1, pid) 
isat— isat+l 

read  (unit— 10, iostat— eof )  (sat ( j, isat) , j— 1, 49) 
Receive  results  from  nodes 
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call  crecv (99, sat (1, isat-1) ,msglen) 
go  to  20 
endif 

Send  message  for  nodes  to  halt  and  clear  cube  for  next  process 

call  csend (5, end,msglen, -1, pid) 

close (unit“10) 

call  killcube  (-1,-1) 

Write  results  to  external  file 

open (unit"ll, file-' /usr/phipps/out4' , form—' unformatted' ) 
do  22  i— l,iaat-l 

22  write (11, *) (sat ( j, i) , j-1, 49) 
close (unit— 11) 

stop 

end  * 
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program  P3T4n 

This  node  program  is  a  parallel  code  for  satellite 
position  prediction  using  the  NAVSPASUR  model.  This 
program  employs  four  nodes  using  the  control 
decomposition  strategy.  The  specific  tasks  for  each 
node  are  separated  by  logical  if  statements. 


implicit  realms  (a-h,m-2) 
real*8  kz 


Declare  cube  specific  function  and  variables  as  integers 

integer  mynode , mclock , msgwait , myhost 
integer  mynod, pid, msglen, host id 

common  /ppt/f  (25),osc(9),u(3),v(3),w(3), vel (3) , r, tm, kz 

common  /cons/c(25) 

common  /n/theta2, etaO, eta20, esqO 

common  /g/g2,g2p,g3,g4,g5 

common  /crit/t2 

common  /sec/agda, agde, agdi, agdl, agdg,  agdh 

data  pid/0/,msglen/8/ 

hostid—myhost () 
mynod—mynode ( ) 

Synchronize  nodes 

call  gsyncO 

Nodes  set  constants  and  receive  data  from  host 


call  consl 

call  crecv (5, f ,msglen*49) 

Nodes  continue  to  execute  ppt3  until  catalog  of 
satellite  data  is  exhausted 

1100  if <tm. eq. 0 . OdO)  go  to  1101 

Node  2  computes  new  T2  and  sends  to  other  nodes 

if (mynod . eq . 2 ) then 
call  critincl 

call  csend (mynod, t 2, msglen, -l,pid) 
endif 

Nodes  execute  respective  portion  of  subroutine  ppt3 
call  ppt3 (mynod) 

Node  0  send  results  to  host 


if (mynod. eq.O) then 

call  csend(99, f ,msglen*49, hostid, pid) 
endif 
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*  Repeat  until  catalog  of  satellite  data  is  exhausted 

call  crecv (5, f ,maglen*49) 
go  to  1100 

1101  continue 

*  End  main  program 

atop 

end 


svibroutine  ppt3  (mynod) 

This  subroutine  preschedules  the  tasks  for  each  node 
numbered  0  through  3 .  Tasks  are  partitioned  by  logical 
if  statements 

Declare  variables 

implicit  real*8  (a-h,m-2) 
real*8  kz 

Declare  cube  specific  function  and  variables  as  integers 

integer  mynode,mclock,msgwait 
integer  mynod, pid,msglen 
integer  msg4(10) 

dimension  msgl (2) ,msg3 (2) 

Define  common  blocks 

common  /ppt/f (25) , osc (9),u(3),v(3),w(3) , vel (3) , r, tm, kz 

common  /cons/c(25) 

common  /n/theta2,eta0,eta20,esq0 

common  /g/g2, g2p, g3, g4, g5 

common  /crit/t2 

common  /sec/agda, agde, agdi, agdl, agdg, agdh 
equivalence  (magi (1) , osc (6) ) , (msg3 (1) , osc (IX ) 


data  pid/0/,msglen/8/ 


if (mynod. ne .2) then 
esqO-f (5) *f (5) 
theta2-f (8) *f (8) 
eta20“l . OdO-esqO 
etaO— dsqrt (eta20) 

*000000000000000000000000000000000000000000000000000000000 
if (mynod. eq. 0) then 


Post  asynchronous  receive  message  commands 
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msg4  (1)  “'irecv  (2,t2,msglen) 
msg4 (2) “irecv (1, oac (5) ,maglen) 
msg4 (3) “irecv (3, agda,m3glen*6) 

Recover  a  from  mean  orbital  elements 

call  recover 

Send  a  to  all  nodes 

call  csend (mynod, £ (13) ,msglen, -1, pid) 

Compute  g2,g2p,g3,g4,g5 

call  ganma 
hl“tm-f  (9) 

Compute  osculating  l,e,a 

osc (9)“( (£(4) *hl+f (3) ) *hl+f (2) ) *hl 
osc  (3) “pie (osc (9) +f (1) ) 
f (14)“-4.0d0*f (13) *f (3) / (f (2) *3.0d0) 
f (16)“f (5) *eta20*f (14) /f (13) 

osc (4) “dminl (dmaxl (0 . OdO, f (5) +f (16) *hl) , . 99999999d0) 
osc(0)“dmaxl (l.OdO, f (13)+f (14) *hl) 

Send  1,  e  and  a  to  other  nodes 

call  message (osc (3) , osc (4) , osc (8) , mynod, -1,0) 


Compute  sin  I 

f (15) “dsqrt (1 . 0d0-theta2) 

Receive  t2  from  node  2 

call  msgwait (msg4 (1) ) 

msg4 (4) “irecv (2,msg3, 2*msglen) 

Ma)ce  preliminary  computations  for  edll 

tt2“theta2*t2 

pl“ (-8 . 0d0*tt2-3 . OdO) *theta2+l . OdO 

p2“ (-40 . d0*tt2-ll . OdO) *theta2+l . OdO 

vlel“0 . 125d0*g2p*p2-  (5 . 0d0*g4'*pl/  (12 . 0d0*g2p)  ) 

vlel“vlel*f (5) *eta20 

pl“ (-24 . 0d0*tt2-9. OdO) *theta2+l . OdO 

p2“ (1 . 25d0+ . 9375d0*esq0) *g5 

vle2“ (pl*p2+g3) *eta20*f (15) / (4 . 0d0*g2p) 

vll2“vle2+3 . 0d0*eta20*0 . 15625d0*esq0*f (15) »g5*pl/g2p 

pl“ (-16 . 0d0*tt2-5 . OdO) *theta2+l . OdO 

vle3“pl*f (15) ^eta20*e3q0*g5*35 . OdO/ (-3 . 84d02*g2p) 

Receive  g  from  node  1 

call  msgwait (msg4 (2) ) 
m3g4 (5) “irecv (1, DE,msglen) 

Compute  edll 
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cosg»dco3 (osc (5) ) 
sing— dsin (osc (5) ) 


pi— 4 . 0d0*cosg**3-3 . OdO*cosg 
p2— 2 . 0d0*3ing*co3g 

edll— (vlel*p2-vll2*cosg-vle3*pl) *etaO 

Receive  cos  f,  sin  f  from  node  2 

call  msgwait (msg4 (4) ) 
msg4 (6) — irecv (2, 0S,m3glen) 

Compute  ed21 

w22- (1 . OdO+osc (4) *osc (1) ) * (2+03C (4) *osc (1) ) /eta20 

pl-(3.0d0-4.0d0*osc (2) **2) *osc(2) * (1 . OdO-2 . 0d0*sing**2) 

pi- (4 . 0d0*o3C (1) **2-3 . OdO) *03c (1) *p2+pl 

p3— (1 . OdO-2 . 0d0*sing**2) *03C (2) +osc (1) *p2 

p4-3 . OdO* (1 . 0d0-theta2) 

p5— 1 .  OdO+3 . 0d0*theta2 

eta3  eta20*eta0 

ed21- (pi* (w22+l . OdO/3 . OdO) +p3* (1 . 0d0-w22) ) *p4 

ed21- (ed21+03C (2) * (w22+l . OdO) *p5*2 . OdO) *eta30*g2p/-4 . OdO 

Send  ed21  to  node  2 

call  csend (mynod, ed21,msglen, 2, pid) 

Compute  edl  and  a 

edl-edll+ed21 

pi- (cosg*co3g-sing*sing) * (osc (1) *osc (1) -osc (2) *osc  (2) ) - 
&  p2*2 . OdO*osc (1) *osc (2) 

p6-(l. OdO+osc (4) *osc(l) ) **3 

osc (8) —osc (8) * (1 . 0d0+g2p/eta20* (p5* (p6-eta30) + 

&  p4*p6*pl) ) 


Receive  sectoral  corrections  from  node  3 

call  msgwait (msg4 (3) ) 

DL-edl+agdl 

osc ( 8 ) —osc ( 9 ) +agda 

Receive  DE  from  node  1 

call  msgwait (msg4 (5) ) 

msg4 (7) — irecv (l,magl,msglen*2) 

Compute  final  value  for  e  and  1 

esq— de**2+dl**2 
osc (4) — dsqrt (esq) 
sinl— dsin (osc  (3) ) 
cosl— dcos (osc (3) ) 

osc (3) — artnq (DE*sinl+DL*cosl,DE*cosl-DL*3inl) 
Solve  Kepler' s  eqn 
call  )cepler 


if (kz .ne . 0 . OdO) cf-c (12) 
eta2— 1 . OdO-esq 

r— osc (8) *eta2/ (1 . 0d0+o3c (4) *osc (1) ) 

Receive  final  computations  from  other  processors 

call  msgvait (ms94 (7) ) 
call  msgwait (msg4 (6) ) 

Compute  g 

osc (5) “os-osc (3) -osc (6) 

Compute  position  and  velocity  vectors 

cosg-dcos (osc (5) ) 
sing^dsin (osc (5) ) 
cosh«dco3 (osc (6) ) 
sinh-dsin (osc (6) ) 

pl“sing*03C (1) +co3g*osc (2) 
p2— cosg*osc (1) -sing*o3c (2) 
sini—dsqrt (1 . OdO-osc (7) **2) 

u (1) — co3h*p2-3inh*pl*03c (7) 
u  (2)  ••3inh*p2+cosh*pl*osc  (7) 
u (3) — pl*sini 


V (1) — cosh*pl-sinh*p2*osc (7) 

V  (2)—3inh*pl+co3h*p2*osc  (7) 

V  (3)  'T>2*8ini 

w (1) "sinh^sini 

V  (2)  —cosh^sini 
w (3) -osc (7) 

p3-03c (4) *osc (2) 
p4-o3c ( 4 ) *o3c ( 1 ) +1 . OdO 
pS-dsqrt (eta2*osc (8) ) 

do  11  i-1,3 

11  vel (i) - (p3*u (i) +p4*v (i) ) /p5 

vel (1) — vel (l)+cf*r*u(2) 
vel (2) —vel (2) -cf*r*u (1) 

End  for  node  0 


endif 

*00000000000000000000000000000000000000000000000000000000 

*11111111111111111111111111111111111111111111111111111111 

*  Begin  node  1 

if (mynod . eq . 1 ) then 


m3g4 (1) -irecv (0, f (13) ,msglen) 
msg4 (2) —irecv (2, t2,msglen) 


96 


msg4 (3) »irecv (3, adga,m3glen*6) 


Compute  sin  i  and  tan  i 

f (15)-dsqrt (1 . 0d0-theta2) 
tani-f (15)/f (8) 
hl-tm-f (9) 

pl-1 . OdO-5 . 0d0*theta2 

p2— 35 . OdO+24 . 0d0*eta0+25 . 0d0*eta20 

p3-90 . OdO-192 . 0d0*eta0-126 . 0d0*eta20 

p4-385 . OdO+360 . OdO^etaO+45 . 0d0*eta20 

p5— 270 .  OdO+126 . 0d0*eta20 

p6-385 . OdO-189 . 0d0*eta20 

theta4-theta2*theta2 

Receive  a  from  node  0 

call  msgwait (msg4 (1) ) 

Compute  g2,  g2p, g3, g4, g5 

call  gamma 

Compute  g 

osc(9)-( (f (4) *hl+f (3) ) *hl+f (2) ) *hl 
f  (11)  — 1.5d0*g2p*pl+.09375d0''g2p**2*  (p2+p3*theta2 
&  +p4*theta4) +.3125d0*g4* (21 . OdO-9 . 0d0*eta20+p5*theta2 

&  +p6*theta4) 

ql-f (2) (13) **1.5d0 
f (ll)-f (ll)/ql 

osc (5) *pie (f (6) +f (11) *o3C (9) ) 

Send  g  to  all  nodes 

call  csend (mynod, osc (5) ,m3glen, -l,pid) 

Complete  preliminary  computations  to  solve  for  die  and  dli 

cosg— dcos (osc (5) ) 

sing— dsin (osc  (5) ) 

p3— (3 . OdO-4 . 0d0*sing**2) *3ing 

p4— 2 . 0d0*cosg**2-l . OdO 

p5-2 . 0d0*3ing*cosg 

p 6— cosg* cosg- sing* sing 

Receive  t2  from  node  2 

call  msgwait (msg4 (2) ) 

m3g4 (4) — irecv (2,msg3,m3glen*2) 

tt2-theta2*t2 

pi- (-8 . OdO*tt2-3 . OdO) *theta2+l . OdO 

p2- (-40 . d0*tt2-ll . OdO) *theta2+l . OdO 

vlel-0 .125d0*g2p*p2- (5 . 0d0*g4*pl/ (12.0d0*g2p) ) 

vlel— vlel*f (5) *eta20 

pi- (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

p2-(1.25d0+.9375d0*esq0) *g5 

vle2- (pl*p2+g3) *eta20*f (15) / (4 . 0d0*g2p) 

pi- (-16 . 0d0*tt2-5 . OdO) *theta2+l . OdO 

vle3-pl*f (15) *eta20*esq0*g5*35 . OdO/ (-3 . 84d02*g2p) 
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Compute  die  and  dli 


dle«^lel*p4+vle2*3ing+vle3*p3 
dli**-f  (5)  *dle/  (eta20*tani) 

Receive  1,  e  and  a  from  node  0 

call  mesaage (osc (3) , oac (4) ,osc (8) ,mynod, 0, 1) 

Receive  cos  f,  sin  f  and  r  from  node  2 

call  msgwait (msg4 (4) ) 

Compute  d2i 

pi— (4 . 0d0*osc (1) **2-3 . OdO) *osc (1) *p4 
pi— pl-p5* (3 . OdO-4 . 0d0*osc (2) **2) *03c (2) 
p2— p4*osc (1) -p5*osc (2) 

p3— (osc (1) *03C (1) -osc (2) *osc (2) ) *p6-2 . 0d0*p5*osc (1) *osc (2) 
d2i-( (pl+3.0d0*p2) *f (5) +3 . 0d0*p3) *f (15) *f (8) *g2p 

Compute  d2e 

w20— osc (1) * (3 . OdO+osc (4)*osc(l)*(3. OdO+osc (4) *osc (1) ) ) 
p4— etaO+1 . OdO/ ( 1 . OdO+etaO) 
p5— 1 . 0d0-theta2 

d2e-0.5d0*g2p* ( (3 . OdO*theta2-l . OdO) * (M20+f (5) *p4) 

&  +3 . 0d0*p5* (w20+f (5) ) *p3-eta20*p5* (3 . 0d0*p2+pl) ) 

Compute  di  and  de 

di-dli+d2i 

de-dle+d2e 

Receive  sectoral  corrections 

call  msgwait (msg4 (3) ) 

Compute  DE  and  send  to  node  0 

DE-de+osc ( 4 ) +agde 

call  csend (mynod, DE,msglen, 0,pid) 

Compute  DI 

pi-  (f (8) +1 . OdO) /2 . OdO 
p2-dsqrt (pi) 
p3-dsqrt ( 1 . OdO-pl ) 

DI— p3+ . 5d0*p2* (di+agdi) 

Receive  osc (6)  and  DH  from  node  3 

call  message (osc (61 , DH, dummy, mynod, 3, 1) 

Compute  final  cos  i  and  h  and  send  to  node  0 

osc (7) -1 . OdO-2. OdO* (DI**2+DH**2) 
sinh— dsin (osc (6) ) 
cosh— dcos (osc (6)  ) 

osc (6) — artnq (DI*3inh+DH*co3h, DItcosh-DH*3inh) 
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call  csend (mynod,msgl,m3glen*2, 0, pid) 


*  End  of  node  1 

endif 

*11111111111111111111111111111111111111111111111111111111 

*33333333333333333333333333333333333333333333333333333333 

*  Begin  node  3 

if (mynod . eq . 3 ) then 

m3g4 (1) — irecv (0, f (13) ,m3glen) 
m3g4 (2) -irecv (2, t2,m3glen) 
m3g4 (3) — irecv (1, 03C (5) ,m3glen) 

sini2— 1 . 0d0-theta2 
f  (15)  — dsqrt  (sini2) 
hi— tm-f (9) 

03c(9)-hl* (f (2)+hl*(f (3)+hl*f (4) ) ) 

theta4— theta2*theta2 

pi— 1 . OdO-5 . 0d0*theta2 

p2— 35 . OdO+24 . 0d0*eta0+25 . 0d0*eta20 

p3-90 . OdO-192 . 0d0*eta0-126 . 0d0*eta20 

p4-385 . OdO+360 . 0d0*eta0+45 . 0d0*eta20 

p5— 270  .  OdO+126 . 0dC*eta20 

p6-385 . OdO-189 . 0d0*eta20 

p7— 5 . OdO+12 . OdO*etaO+9 . 0d0*eta20 

p8-35 . OdO+36 . 0d0*eta0+5 . 0d0*eta20 

p9-3 . OdO-7 . 0d0*theta2 

*  Receive  a  from  node  0 

call  magvait (m3g4 (1) ) 

*  Compute  g2, g2p, g3, g4, g5 

call  gamma 

*  Compute  h 

f (12)-(-3 .0d0*g2p+.375d0*g2p**2 

&  * (p7-p8*theta2) +1 . 25d0*g4* (5 . OdO-3 . 0d0*eta20) *p9) *f (8) 

ql-f (2)*f (13)**1.5d0 
f (12)-f (12) /ql 
if (kz . ne . 0 . OdO) cf— c (12) 

osc (6) —pie (f (7) ff (12) *o3c (9) ) -cf * (tm-c (1) ) 

*  Send  h  ^  node  2 

call  csend (mynod, osc (6) ,m3glen,  2, pid) 

*  Perform  preliminary  computations  for  sectoral  corrections 

f (11) — 1 . 5d0*g2p*pl+ . 09375d0*g2p**2* (p2+p3*theta2 
&  +p4*theta4) +.3125d0*g4* (21. OdO- 9 . 0d0*eta20+p5*theta2 

&  +p6*theta4) 

f  (ll)-f (ll)/ql 
agda— 10 . OdO 
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call  sector (hi) 


Receive  t2  from  node  2 

call  msgwait (msg4 (2) ) 

msg4 (4) “irecv (2,msg3,msglen*2) 

Complete  preliminary  computations  to  compute  sini  dlh 

tt2-theta2*t2 

pi-  (40 . 0d0’*tt2+16 .  OdO)  *tt2+3 .  OdO 
p2- (2 . 0d02*tt2+80 . OdO ) *tt2+ll . OdO 

vlhli-(5.0d0*g4*pl/ (12 . 0d0*g2p) - . 125*g2p*p2) *f (15) *esq0*f (8) 

p2— 4 . OdO+3 . 0d0*esq0 

p3- (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

vlh2i-( (f (5) *f (8) *.25d0) /g2p) * (g3+ (5 . 0d0*g5/16 . OdO) *p2*p3 
&+15.0d0*g5*  (£(15)  **2)  *p2*pl.'8  .  OdO) 

p2-(-8.0d0*tt2-2.5d0) *theta2+0 . 5d0 
p3-2.0d0*pl-1.0d0 

vlh3i-(-35.0d0*g5*esq0*f (5) *f (8) ) * (p2+p3*f  (15) *f  (15) ) 

&/ (576.0d0*g2p) 

Receive  1,  e  and  a  from  node  0 

call  message (osc (3) , osc (4) ,osc (8) ,mynod,  0, 1) 

Receive  g  from  node  1 

call  msgwait (msg4 (3) ) 

Compute  sectoral  corrections  and  send  to  all  nodes 

agda— 0 • OdO 
agde— 0 . OdO 
agdi— 0 • OdO 
agdl— 0 . OdO 
agdg— 0 . OdO 
agdh— 0 . OdO 

call  sector (hi) 

call  csend (mynod, agda,msglen*6, -1,  pid) 

Compute  sini  dlh 

cosg— dcos (osc (5) ) 

sing— dsin (osc (5) ) 

pl-2 . 0d0*sing*cosg 

p2- (4 . 0d0*cosg**2-3 . OdO) *cosg 

sinidlh— vlhli*pl+vlh2i*cosg+vlh3i*p2 

Receive  cosf,  sinf  from  node  2 

call  msgwait (msg4 (4) ) 

Compute  sini  d2h 

wl7— artnq (osc (2) , osc (1) ) +03c (4) *03c (2) -pie (osc (3)  ) 
p2-2 . 0d0*cosg**2-l . OdO 
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p3- (4 . OdO*osc (1) **2-3 . OdO) *03c (1) 
p4— (3 . OdO-4 . OdO*osc (2) **2) *03c (2) 

p5-pl* (6.0d0*O3C (1) **2-3.0d0)+6.0d0*p2*o3c (1) *03c (2) 
p6-(pl*03c(l)+p2*03c(2) )*3.0d0 
w21»^5+ (p6+pl*p3+p2*p4 ) *03C (4) 

3inid2h— 0.5d0*g2p*f (8) *f (15) * (6.0d0*wl7-w21) 


Compute  DH  and  send  osc(6)  and  DH  to  node  1 

DH»0 . 5d0* (sinidlh+sinid2h+agdh) /dsqrt (0 . 5d0+0 . 5d0*f (8) ) 
call  message (osc  <6) ,DH, 0 . 0d0,mynod, 1, 0) 

End  of  node  3 

endif 


*33333333333333333333333333333333333333333333333333333333 

else 

*222222222222222222222222222222222222222222222222222222222 
*  Begin  node  2 


msg4 (1) —irecv (0, f (13) ,msglen) 
msg4 (2) -irecv (1, osc (5) ,msglen) 
msg4 (3)-irecv(3,03C(6) ,msglen) 

Perform  preliminary  computations  for  dlz 

f (15) —dsqrt (1 . 0d0-theta2) 
tt2-theta2*t2 
esqO-f (5) *f (5) 
eta20=l . OdO-esqO 
etaO-dsqrt (eta20) 
eta30— eta20*eta0 

Receive  a  from  node  0 

call  msgwait (msg4 (1) ) 

Compute  g2,g2p,g3,g4  and  g5 

call  gamma 

Compute  dlz 

pi- (eta30-l . OdO) * . 125d0 

p2- ( (-40 . 0d0*tt2-ll . OdO) *theta2+l . OdO) *g2p 

p3-10 . 0d0*g4/ (3 . 0d0*g2p) 

p4- ( (-8 . 0d0*tt2-3 . OdO) *theta2+l) *p3 

p5- (20 . 0d0*tt2+8 . OdO) *tt2+l . OdO 

p6— (10 . 0d0*p5+l . OdO) *g2p 

p7- (2 . 0d0*p5+l . OdO) *p3 

p8=25 . 0d0*esq0*tt2*tt2*theta2* (g2p- . 2d0*p3) 
p9- ( (-2 . 0d02*tt2-33 . OdO) *theta2+l . OdO) *g2p 
plO-  (  (-40 . 0d0*tt2-9  .  OdO)  *thet.a2+l .  OdO)  *f3 
vlsl— pi* (p2-p4) - . 125dO*e3qO*f (8) *  (p6-p7) 
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&  +p8- . 0625d0*e3q0* (p9-plO) 

pl">«taO+l . OdO/  (1 . OdO+etaO) 
p2-f (8) / (l.OdO+f (8)  ) 
p3-4 . OdO+3 . OdO*esqO 

p4- (-24 . 0d0*tt2-9 . OdO) *theta2+l . OdO 

p5“ (esq0-eta30) *3 . OdO+11 . OdO 

p6- (20 . 0d0*tt2+8 . OdO) *tt2+l . OdO 

p7-(pl+p2) /4.0d0 

p8- (g3+. 3125d0*p3*g5*p4) /g2p 

p9— p5* . 15625d0*g5*p4/g2p 

plO- (1 . OdO-f (8) ) * . 46875d0* (p6*2 . OdO+1 . OdO) *p3*g5*f (8) /g2p 
vls2-(p7*p8+p9+pl0) *f (5) *£ (15) 

p7-3 . 0d0*eta30-3 . OdO- (2 . 0d0+p2) *esqO 
p8- (-16 . 0d0*tt2-5 . OdO) *theta2+l . OdO 

p9- (1 . OdO-f (8) ) *f (8) * . 060763884d0*e3q0* (1 . OdO+4 . 0d0*p6) 
vl33-(p7* . 030381944d0*p8-p9) *f (15) (5)  *g5/g2p 


Receive  g  from  node  1  and  compute  dlz 

call  magwait  (msg4 (2) ) 

sing— dain (osc (5) ) 
coag— dco3 (oac (5) ) 
pll— 2 . OdO*sing*cosg 
pl2-2 . 0d0*co3g**2-l . OdO 

pl3— (coag*co3g-3ing*3ing) *cosg-2 . OdO*coag*sing*sing 

dlz— vial *pll+vl32*coag+vls3*pl3 

Receive  1,  e,  and  a  from  node  0 

call  message (osc (3) , osc (4) ,osc (8) ,mynod^ 0, 1) 
msg4 (4) — irecv (0, ed21,msglen) 

Solve  Kepler' a  equation  and  send  cosf ,  sinf  to  all  nodes 
call  ]cepler 

call  csend (mynod,m3g3,msglen*2, -l,pid) 

Complete  preliminary  computations  for  d2z 

wl7-artnq (osc (2) , osc (1) ) +03c (4) *osc (2) -pie (osc (3)  ) 
p3-(4.0d0*osc<l) **2-3. OdO) *osc(l) 
p4- (3 . OdO-4 . 0d0*o3C (2) **2) *osc  (2) 

p5— pll* (6 . 0d0*osc (1) **2-3 . OdO) +6 . 0d0*pl2*o3C (1) *03C (2) 

p6— (pll*osc (1) +pl2*03c (2) ) *3 . OdO 

w21— p5+ (p6+pll*p3+pl2*p4) *03C (4) 

p7- (-5 . 0d0*theta2+2 . 0d0*f (8) +3 . OdO) *w21 

p8- (-5 . 0d0*theta2+2*f (8) +1 . OdO) *wl7 

Receive  ed21  from  node  0 

call  msgwait (m3g4 (4) ) 

Compute  d2z 

d2z— f (5) *ed21* (pl-l.OdO) /eta30- (6 . 0d0*p8-p7) *g2p*0.25d0 
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*  Receive  h  form  node  3 


call  msgwait (msg4 (3) ) 

Receive  sectoral  corrections 

call  crecv (3, agda,msglen*6) 

Compute  OS 

OS—osc (3) +OSC (5) +OSC (6) +dl2+d22+agdg 


Send  OS  to  node  0 

call  csend (mynod, OS,msglen, 0, pid) 
End  of  node  2 


*22222222222222222222222222222222222222222222222222222222 

endif 

*  Return  to  main  program 

return 

end 


function  pie (x) 

This  double  precision  function  computes  value  of 
X  mod  2*pi 

in^licit  real*8  (a-h,m-z) 

pi2-6.28318530'71d0 
pie-dmod (x,pi2) 
if (pie)  90, 91, 91 

90  ■  pie“pie+pi2 

91  return 
end 


function  artnq(tl,t2) 

This  double  precision  function  computes  inverse  tangent  of  tl/t2 
for  range  0  to  2*pi 

implicit  real*8  (a-h,m-z) 

if (dabs (tl) -dabs (t2) ) 100, 104, 104 

100  artnq—datan (tl/t2) 

if (t2)101,102,102 

101  artnq*artnq+3 . 14159265358979d0 
go  to  105 

102  if (tl) 103, 105, 105 

103  artnq-artnq+6 . 28318530717959d0 
go  to  105 


103 


104 


artnq—atan (-t2/tl) +1 .57079632679489d0 


if (tl)101,105,105 
105  return 
end 


subroutine  recover 

This  subroutine  recovers  the  value  of  a" 
iteratively  from  m 

in^licit  real*8  (a-h,m-2) 
real*8  kz 

common  /ppt/f (25) , osc (9) , u (3) , v (3) , w (3) , vel (3) , r, tm, kz 

common  /oon3/c<25) 

common  /n/theta2, eta, eta2, esq 


f (13)-f (2)** (-2.0d0/3.0d0) 


do  110  i-1,5 

g2p-c  (3)  /  (f  (13)  *eta2)  **2 
g4-c (5) / (f (13) *eta2) **4 

pi- ( (35 . 0d0*theta2) -30 . OdO) *theta2+3 . OdO 
p2-25 . 0d0*eta2+144 . 0d0*eta+105 . OdO 
p3— 90 . 0d0*eta2-96 . 0d0*eta+30 .  OdO 
p4-25 . 0d0*eta+16 . OdO*eta-15 . OdO 
p5-3 . 0d0*theta2-l , OdO 

110  f (13)-( (1.0d0+1.5d0*g2p*eta*p5+,09375d0*g2p**2 

&  * (p4+theta2* (p3+p2*theta2) ) +. 9375d0*g4*eta*esq*pl) 

&  /f (2) ) ** (2.0d0/3.0d0) 

return 

end 


subroutine  gamma 

This  subroutine  computes  the  dimensionless  quantities 

implicit  real*8  (a-h,m-z) 
real*8  kz 

cononon  /ppt/f  (25)  ,o3C  (9)  ,u(3)  ,v(3)  ,w  (3,)  ,  vel  (3)  ,  r,tm,kz 

common  /cons/c(25) 

common  /n/theta2,eta0,eta20,e3q0 

common  /g/g2, g2p, g3, g4, g5 

g2-c (3) /f (13) **2 

g2p-g2/eta20**2 

g3-c (4) / (f (13) *eta20) **3 

g4-c(5) / (f (13) *eta20) **4 

g5-c (6) / (f (13) *eta20) **5 

return 

end 
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subroutine  critincl 


This  subroutine  computes  the  value  of  T2 

inqplicit  real*8  (a-h,m-z) 
realms  kz 

common  /ppt/f (25) , osc (9),u(3),v{3),w(3), vel (3) , r, tm, kz 
common  /n/theta2, etaO,  eta20, esqO 
common  /crit/t2 

theta2-f (8)*f (8) 
tl-1 . OdO-5 . 0d0*theta2 
beta-1. 0d02/ (2.0d0**ll) 

p3— beta*tl*tl 
pi— dexp (-p3) 
p2— 1 . OdO+pl 
plex— pl*pl 
p4-1.0d0 
p3ex— 0 . 5d0*p3 

do  210  i-2,13 

if  (i  .  le  .  11)  p2-p2»  (1 .  OdO+ple:;) 
plex— plex*plex 
p4— p4+p3ex 

210  p3ex— p3*p3ex/dble  (i+1) 

t2— p2*p4*beta*tl 

return 

end 

subroutine  kepler 

This  subroutine  solves  Kepler's  Equation  using  Steffenson's 
Method  and  then  computes  cos  f  and  sin  f 

implicit  real*8  (a-h,m-z) 
real*8  kz 

common  /ppt/f (25) , osc (9),u(3),v(3),w(3), vel (3) , r, tm, kz 

e3— osc (3) 
do  410  i-1,20 
el— e3 

e3— osc (3) +OSC (4) *dsin (e3) 

if (dabs (e3-el) . It . 1 . Od-08)  go  to  420 

e2-e3 

e3— osc (3) +OSC (4) *dsin (e3) 
if (dabs (e3-e2) . It . 1 . Od-08)  go  to  420 
410  e3-e3+ (e3-e2) **2/ (2 . 0d0*e2-el-e3) 

420  cosf— dcos  (e3) 

el-1 . OdO-osc (4) *cosf 
osc (1) - (cosf -osc (4) ) /el 
eta-dsqrt (1 . OdO-osc (4) **2) 
osc (2)-(eta*dsin(e3))/el 


return 

end 


subroutine  sector (hi) 

*  This  subroutine  computes  sectoral  correction  terms  to  be  added 

*  to  the  variation  of  each  of  the  orbital  elements 

implicit  real*8  (a-h,m-z) 
real*8  kz 

dimension  elm (8) , slm (8) ,tf (8) ,tfp(8) ,te(8,8) 

common  /ppt/f (25) , osc (9) , u (3) , v (3) , w (3) ,vel(3) ,r,tm,kz 

common  /cons/c(25) 

common  /n/theta2, eta, eta2, esq 

common  /sec/agda, agde, agdi, agdl, agdg, agdh 

if (agda.ne . 1 . OdOl) go  to  330 
clm(l)-0.35961113d-07 
clm(2)-0.18246786d-05 
elm (3) -0 . 22207398d-05 
clm(4)  ■•clm(3) 
clm(5)-0.37098633d-06 
clm(6)-clm(5) 
clm(7)-0.22580118d-06 
elm  (8) -elm  (7) 

slm(l)-0.24286442d01 
slm(2)-0.57568525d01 
3lm(3)-0.12107857d00 
slm  (4)  —slm  (3) 
slm (5) —0 . 96916794d0 
3lm(6) -slm(5) 
slm(7)-0.11169656d01 
slm(8) -slm(7) 

te(7,l)-0.0d0 
te(7,2)-0.0d0 
te(7,3)-1.0d0 
te  (7,  4)  —1 .  OdO 
te(7,5)-1.0d0 
te  (7,  6)— l.OdO 
te(7,7)-1.0d0 
te(7,8)— l.OdO 
te  (8, 1)-I.0d0 
te(8,2)-2.0d0 
te  (8, 3) — 1 . OdO 
te(8,4)-1.0d0 
te(8,5)-2.0d0 
te (8, 6)-2.0d0 
te(8,7)-3.0d0 
te  (8, 8) — 3 . OdO 

feta— 1 . OdO+eta 
fl-l.OdO+f (8) 
f2-1.0d0-f (8) 
f3-1.0d0+3,0d0*f (8) 
f4-1.0d0-3,0d0*f (8) 
fl52-f (15)*f (15) 
rl-f (2) *f (11) 
r2-f (2)*f (12)-c(12) 

tf  (1)— 1.5d0*f  (8)*f  (15) 
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tf  (2)-1.5cl0*fl52 

tf (3) -0 . 9375d0*fl52*f3-0 . 75d0*fl 
tf (4)-0.9375d0*fl52*f4-0.75d0*f2 
tf (5)-1.875d0*f (15) *fl*f4 
tf  (6)— 1.875d0*f  (15)  *f2*f3 
tf (7) -5 . 625d0*fl52*fl 
tf (8)-5.625d0*fl52*f2 


tfp(l)— 1.5d0*  (f  (8)**2-fl52) 
tfp(2)-3.0d0*f (8) *f (15) 

tfp (3)-f (15) * (1.875d0*f (8) *f 3-2 . 8125d0*f 152+0 . 75d0) 
tfp(4)-f (15) * (1.875d0*f (8) *f 4+2. 8125d0*f 152-0. 75d0) 
tfp(5)-fl*  (1.875d0*f  (8)  ''f4+3.75d0*fl52) 
tfp(6)-f2* (1.875*f (8) *f3-3.75d0*fl52) 
tfp (7)-f (15) * (11.25d0*f (8) *fl-5. 625d0*fl52) 
tfp (8) -f (15) * (11.25d0*f (8) *f2+5.625d0*fl52) 


ta-l.OdO/f (2)/f (13)**5 

flpl-6.0d0 

tg— 1 . 0d0/eta2/eta 

tgp»3 . OdO»f (5) *tg/eta2 


302 

301 


& 

300 


do  300  i-1,8 

if (1-3)301,302,301 
flpl-8.0d0 
tg«tgp/3 . OdO 

tgp- (1 . OdO+4 . OdO*e3q) / (eta2**2*eta2*eta) 
ta-ta/f (13) 

tai-ta*clm(i) / (rl*te (7, i) +r2*te (8, i) ) 
te (2, i) — tai*tf (i) *tg*eta*te (7, i) /f  (5) 
te(3,i)-tai*tf (i) » (te (7, i) *f (8) -te (8, i) ) /f (15) *tg/eta 
te(4,i)-tai*(flpl*f(5) *tg-eta2*tgp) *tf (i) 
te(5,i)«tai*'(tf(i)*f(5)  *eta/ feta*tgp+flpl*tf  (i)  *tg 
+f (15) /f l*tfp (i) *tg/eta) 
te (6, i) -tai*tfp (i) *tg/eta 


330 


go  to  340 
the— oac (6) 

if  ()cz  .eq.  0 .  OdO)  the—the-c  (12)  *  (f  (9)  +hl-c  (1)  ) 


do  341  i-1,8 

sint— osc  (5)  *te  (7,  i)  +the*te  (8,  i)  -slin(i) 
cost— dcos (sint) 
sint— dsin (sint ) 
agde— agde+te (2, i) *co3t 
agdi— agdi+te (3, i) ^cost 
agdl— agdl+te (4, I) *3int 
agdg— agdg+te (5, i) *3int 
341  agdh— agdh+te (6, i) *3int 


340  return 
end 


subroutine  message (dl, d2, d3,mynod, dest, itype) 


This  subroutine  is  used  to  join  to  disjoint  variables 
into  a  single  array  to  be  sent  or  received  in  one  message 
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implicit  real*8  (a-h,m-z) 


dimenaion  msg2 (3) 

integer  mynod, dest, pid, itype^msglen 
data  pid/0/,maglen/8/ 


if (itype . eq. 0) then 
mag2 (1) -dl 
mag2 (2) -d2 
mag2 (3) “dS 

call  caend (mynod, msg2,m3glen*3, deat, pid) 

elae 

call  crecv (deat,m3g2,m3glen*3) 
dl— ni3g2  (1) 
d2— m3g2 (2) 
d3— m3g2 (3) 
endif 


return 

end 

aubroutine  consl 

*  Thia  subroutine  is  used  by  NAVSPASUR  to  set  the 

*  constants  used  by  the  satellite  motion  model 

implicit  real*8  (a-h,m-z) 

common  /cona/c(25) 

beta-398597 . 62579588d0 

ERKM-6378.l35d0 

rLAT-298.26dO 

*  K-TEEMS 

C20— 0.484l605d-03 
C30-0.95958d-06 
C40-0.55199d-06 
C50-0.65875d-07 
c(3)— 0.5d0*C20*d3qrt  (S.OdO) 
c (4)-C30*d3qrt (7.0d0) 
c(5)-0.375d0*C40*d3qrt (9.0d0) 
c(6)-C50*dsqrt (ll.OdO) 

*  TWO  PI 


c (7)-6.283l85307179586d0 
c(16)-1.0d0/c(7) 

HERGS/DAY  ,  SECS/HERG,  MINS/HERG 

c (9) -ERKM»d3qrt (ERKM/BETA) 
c(8)-86400.0d0/c(9) 
c(17)-1440.0d0/r (8) 
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WE (RAD/DAY),  WE-  2  PI,  WE (RAD/HERG) 


A 


c(ll)-6.3003880387d0 

c(10)-c(ll)-c(7) 

c(12)-.c(ll)/c(8) 


EARTH  FLATTENING 

c (13) - (2 . OdO-1 . OdO/FLAT) /FLAT 

SM/ER,  KM/ER,  NM/ER 

c (20) -ERKM/1 . 609344d0 
c (21) -ERKM 
c(22)-ERKM/1.852dO 


DEG/RAD 

c(23)-360.0d0/c(7) 


RANGE  RATE/ER/HERG  TO  CYCLES/SECOND  -  CONVERSION 
c(24)-c(21) *216.980d+06/ (c(9) *2. 997925d+05) 

FENCE  PLANE  DISPLAC  FROM  EARTH  CENTER 
c(25)-0.31000d-02 


return 

end 
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APPENDIX  D 


p*T  SOURCE  CODE  LISTING 

This  appendix  contains  the  listing  of  host  and  node 
progrzuns  comprising  p’T  program  set.  The  host  prograun  loads 
the  node  progreim  and  clears  the  nodes  once  the  process  is 
complete.  The  node  programi  contains  the  instructions  for  the 
nodes  to  complete  their  respective  portions  of  the  P^T 
parallel  algorithm.  The  node  prograun  assigns  Node  0  to  be  the 
distributing  node,  the  highest  numbered  node  to  be  the 
collecting  node,  amd  the  remaining  nodes  to  be  the  working 
nodes .  The  working  nodes  execute  the  original  NAVSPASUR 
subroutine  PPT2  with  only  minor  modifications . 

Host  Prograun; 

program  P3th 

*  This  host  program  loads  the  node  program  P3tn  on  the 

*  nodes  of  the  attached  hypercube.  Upon  completion  of 

*  the  catalog  of  satellite  data,  program  clears  nodes 

*  for  another  process. 

*  Set  host  specific  parameters 

data  pid/0/ 

*  Set  process  id 

call  setpid(pid) 

*  Load  program  P3tn  on  the  nodes 

print*, ' loading  nodes' 
call  load (' p3tn' , -1, pid) 

*  Receive  message  that  nodes  are  complete 

call  crecv (99, istop, 4) 
print*, ' nodes  complete' 

*  Kill  process  on  nodes 
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call  killcxibe  (-1, -1) 


stop 

end 
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Node  Program: 


program  P3Tn 

*  This  program  propagates  n-2  satellites  concurrently, 

*  where  n  is  the  number  of  nodes  belonging  to  the 

*  attached  cube.  Node  0  is  the  distributing  node  and 

*  Node  n-1  is  the  collecting  node.  The  remaining  nodes 

*  are  the  working  nodes  that  propagate  the  satellites 

*  using  NAVSPASUR  subroutine  PPT2.  For  simplicity, 

*  the  tasks  for  all  nodes  are  combined  on  this  one  node 

*  program.  Tasks  are  partitioned  by  logical  if  statements. 

implicit  real*8  (a-h,o-z) 
real*8  kf(lO) 

integer  mynod, numnode, hostid, pid 
integer  mynode, numnodes,myhost,mclock 
integer  eof 

dimension  ar (6,  8 ) , br (3 , 6) , cr (3 , 6) , dv (3 ) 
coRonon/cons/a  (€4) 

common/ppt/f (25) , osc (10) ,kf (10) ,cf(10) ,bs (3,4) ,u(3) ,v(3) ,w(3) ,r, 
6  vel (3) , dind, tm, dkz, dident 

common/dcsub/pe  (6,8),e(8,8),ep(8,8),g(8),gp(8),  ifti  (8)  ,  if  to  (8)  , 
S  iteri, itero, jof , jol, stat (20) , tol (6) , iw, of (11) , ow (8, 8) 
common/ foreo/rho (3 ) , ros, hdr, hdv, rdv,  del, iter 
common/bloc/sat (84, 4800) 

data  iatop/l/,pid/0/,maglen/672/ 
data  isat/l/,n/l/ 

mynod“mynode ( ) 
numnod-v^numnodes  () 
hostid— myhost () 

*000000000000000000000000000000000000000000000000000000 

*  Node  0  reads  and  distributes  data  among  the  working 

*  nodes 


if (mynod. eq. 0) then 

Read  complete  catalog  of  satellite  data 

open (unit— 10, file—' / usr/phipps/in' , form—' unformatted' ) 
read (unit— 10, iostat-eof ) (sat ( j, 1) , j— 1, 84) 

1200  if (eof . ge . 0) then 

isat— isat+1 

read (unit— 10, iostat— eof ) (sat ( j, isat) , j— 1, 84) 
go  to  1200 
endif 

close (unit-10) 

Send  nijmber  of  data  sets  to  all  nodes 

tO-mclock  0 
isat-isat-1 

call  csend (mynod, isat, 4, -1, pid) 


112 


*  Distribute  satellite  data  sets  to  worsting  nodes 

iter— isat/  (nuiiinode-2) 
do  1201  j— l,iter 
do  1201  i— 1, numnode-2 

call  csend (mynod, sat (1, n) ,msglen, i, pid) 

1201  n-n+1 

iter— mod  (isat,  nuinnode-2) 
n— n-1 

do  1202  j— l,iter 

1202  call  csend (mynod, sat (1,  n+j)  ,m3glen,  j,  pid) 

*  End  Node  0 

*00000000000000000000000000000000000000000000000000000 

*wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 

*  Begin  working  nodes 

else 

if (mynod. It ,numnode-l) then 

*  Use  stibroutine  consl  to  set  constants 

call  consl 

*  Receive  number  of  data  sets  from  Node  0  and  compute  total  number 

*  of  satellites  to  propagate 


call  crecv (0, isat, 4) 
t0-mclock() 

if (mynod . le .mod (isat , numnode-2) ) then 
iter— isat/ (numnode-2) +1 
else 

iter— isat/ (numnode-2) 
endif 

Receive  satellite  data,  execute  PPT2,  and  send  results  to 
collecting  node 

do  1220  i— l,iter 

call  crecv (0, f,msglen) 

Set  parameters  for  subroutine  ppt2 

ind— 1 

kz— idint (dkz) 

Compute  secular  recovery 

call  ppt2(ind,kz) 

Compute  subsequent  task,  ie .  predict  position,  update  elements  ... 

ind— idint (dind) 
call  ppt2(ind,kz) 

1220  call  csend (mynod, f,m3glen, numnode-1, 0) 
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End  Working  Nodes 


*wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 

♦cccccccccccccccccccccccccccccccccccccccccccccccccccccc 

*  Begin  Collecting  Nodes 

else 

*  Receive  total  number  of  satellite  sets 

call  crecv (0, isat, 4) 

*  Collect  results  from  Working  Nodes 

do  1230  i— Ifisat 

1230  call  crecv (-1, sat <1, i) ,m3glen) 

*  Write  results  to  external  file 

open (unit"!!, file-' /usr/phipps/out' , form—' unformatted' ) 
do  1231  i— l,isat 

1231  write (unit— 11, *) (sat ( j, i) , j— 1, 84) 
close (unit-11) 

*  Send  message  to  Host  that  process  is  complete 

call  csend (99, istop, 4, hostid,pid) 

*  End  Collecting  Node 

*CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

endif 

endif 

stop 

end 
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